Please note: Plots (Histogram, Count and distribution, Pair plot, Box plot, Violin Plot, Joint plot, Scatter plot, Correlation plot) comparing the candidate genes with the other maize genes is generated using a randomly downsampled set of other maize genes. The size of the randomly downsampled genes is equal to the size of the user defined candidate genes.

Example Candidate gene list:

To demonstrate the potential use case of the user candidate gene analysis, we gathered a set of fifty stress genes that were differentially expressed between the control and salt stress samples and used them to identify unique characteristics common among salt stress genes .

Please go through the below explanation for details on table and plot interpretations

Table column interpretation:

For each of the Enhancer (Husk or Seedling), counts were calculated for three regions: one using the gene body,second inclusive of the upstream and downstream region, defined as a 1KB region directly upstream of the gene start site, and 1 KB region directly downstream of the gene end site, third covering larger base pair inclusive of the upstream and downstream region, defined as a 5KB region directly upstream of the gene start site, and 5 KB region directly downstream of the gene end site:


1. Enhancer Husk

2. Enhancer Seedling

3. 1kb Enhancer Husk

4. 1kb Enhancer Seedling

5. 5kb Enhancer Husk

6. 5kb Enhancer Seedling


7. Labels

The labels are the classes or the groups the genes are mapped into.The labels can act as both target variable or feature as per the need of the user for solving their specific problem

7.1 No Label

This selection is provided to enable users to view the properties of all genes without labeling them into different gene categories or annotations. This is to let users examine the features of multiple genes and identify common patterns among them. As it involves the inspection of all the genes therefore they work only for "Submit for analysis" button .

7.2 Classical Genes

Classical genes can be defined as the most well-studied genes mainly for their visible mutant phenotype (for example: liguleless3).

7.3 Pan-genome Genes

A gene in a given taxonomic group is either present in every individual (core), or absent in at least a single individual (dispensable).

7.4 Origin Genes

Gene duplication is an important evolutionary mechanism allowing new genetic material and thus opportunities to acquire new gene functions for an organism. There are different origins of duplications such as whole-genome duplications, tandems, etc.


Graph interpretations:

To the top right corner of the plots/graphs, there are options to download plot, zoom-out/zoom in, reset axes, autoscale, toggle spike lines, show closest data on hover, compare data on hover, box select, pan and lasso. Users can also select specific legends to view data only for the selected legends. Details on the interactive plot options are available here:
Interactive graph features

1. Marginal Plot

The Marginal plots are histrogram plot showing the frequency distribution of the selected gene features alonghwith higlighting the candidate genes. This plot will enable the user to easily identify where thier candidate gene lie among the other maize genes for the selected feature.

2. Histogram

The Histogram shows the frequency distribution of the selected Enhancer feature such as Enhancer Husk, Enhancer Seedling, 1kb Enhancer Seedling etc. The X-axis in the histogram represents the range of values present in the selected Enhancer. The Y-axis represents the frequency of values. In addition to the graph,to increase the interpretability of the data, We have also included P-values,mean and standard deviations of the selected datasets.

3. Count and distribution

The Count and distribution plot is a smoothed, continuous version of a histogram estimated from the data. The most common form of estimation is known as kernel density estimation.The x-axis is the value of the selected Enhancer just like in a histogram and the y-axis in a distribution plot is the probability density function and not a probability. The difference is the probability density is the probability per unit on the x-axis. In general the y-axis on the distribution plot is a value only for relative comparisons between different categories like classical and other genes.

4. Pair plots

The pairs plot builds on two basic figures, the distributions and the scatter plot. The distributions on the diagonal allows us to see the distribution of a single selected Enhancer while the scatter plots on the upper and lower triangles show the relationship (or lack thereof) between two variables such as Enhancer Husk and Enhancer Seedling.

5. Box plots

Boxplots are a measure of how well distributed the data in the selected Enhancer's are. It divides the data set into three quartiles. This graph represents the minimum, maximum, median, first quartile and third quartile in the selected Enhancer. It is also useful in comparing the distribution of data across data sets by drawing boxplots for each of them such as the core, non-core, dispensable and private genes. Boxplots can be used to:

6. Violin plots

Violin plots are similar to box plots, except that they also show the probability density of the selected Enhancer at different values. These plots include a marker for the median of the data and a box indicating the interquartile range, as in the standard box plots. Overlaid on this box plot is a kernel density estimation. Like box plots, violin plots are used to represent comparison of a variable distribution such as Enhancer Husk, Enhancer Seedling (or sample distribution) across different "categories" (Classical/Other or Core/Non-core) .
A violin plot is more informative than a plain box plot. In fact while a box plot only shows summary statistics such as mean/median and interquartile ranges, the violin plot shows the full distribution of the selected Enhancer.

7. Joint plots

A Jointplot comprises three plots. Out of the three, one plot displays a bivariate graph(scatter plot) which shows how the one variable( such as Enhancer Husk) varies with the another variable(such as Enhancer Seedling). Another plot is placed horizontally at the top of the bivariate graph and it shows the distribution in the form of marks along an axis for each one of the selected Enhancer's ( for example rug plot of Enhancer Husk and Enhancer Seedling). The third plot is placed on the right margin of the bivariate graph with the orientation set to vertical and it shows the distribution of again the two selected Enhancer's .
It is very helpful to have univariate and bivariate plots together in one figure. This is because the univariate analysis focuses on one variable, it describes, summarizes and shows any patterns in your data and the bivariate analysis explores the relationship between two variables and also describes the strength of their relationship.

8. Scatter plots

A Scatter plot is a great way of exploring relationships or patterns in data. But adding a regression line can make those patterns stand out . Therefore Scatter plot with simple linear regression for the selected Enhancer's, explains the strength of the relationship between the two variables such as Enhancer Husk or Enhancer Seedling in your scatter-plot using R2, the squared correlation coefficient.It is always between 0 and 1. Higher R2 indicates stronger relationship between the selected Enhancer's .

9. Correlation plots

Correlation heatmap is graphical representation of correlation matrix representing correlation between different selected Enhancer's. Correlation ranges from -1 to +1. Values closer to zero means there is no linear trend between the two variables. The closer to 1 the correlation is the more positively correlated they are; that is as one increases so does the other and the closer to 1 the stronger this relationship is. Correlation plots also alerts us to potential multicollinearity problems.

10. Dendrogram plots

A dendrogram is a type of tree diagram showing hierarchical clustering — relationships between similar sets of genes based on the selected Enhancer's. They are frequently used in biology to show clustering between genes or samples, but they can represent any type of grouped data.
The branches in the dendrogram are arranged according to how similar (or dissimilar) they are. branches that are close to the same height are similar to each other; branches with different heights are dissimilar — the greater the difference in height, the more dissimilarity. Also the different clusters of genes based on the selected Enhancer's are marked by different colors.

11. Hierarchical Scatterplot

The Hierarchical scatter plot is a type of pair plot that can be used to visualize the relationship of different pairs of Enhancer's on the clusters of genes formed during dendrogram.Therefore to create Hierarchical scatterplot we need to input the number of clusters in the choose cluster input box. For the Hierarchical Scatter plot, we need to specify the number of clusters we want to view. Moreover, this number of clusters corresponds to the number of clusters formed in the Dendrogram plot, so we perform this analysis only after studying the Dendrogram plot. The Hierarchical Scatter plot is dynamically sized for the number of clusters, so every user can analyze the dataset based on how many clusters they believe the selected data is forming .

12. Hierarchical Heatmap

Hierarchical clustering heatmap is an intuitive way to visualize information from complex data. It’s also called a false colored image, where data values are transformed to color scale. Heat maps allow us to simultaneously visualize clusters of samples(selected Enhancer's) in the column and features (genes) in the rows.
First hierarchical clustering is done of both the rows and the columns of the data matrix. The columns/rows of the data matrix are re-ordered according to the hierarchical clustering result, putting similar observations close to each other. The blocks of ‘high’ and ‘low’ values are adjacent in the data matrix. Finally, a color scheme is applied for the visualization and the data matrix is displayed. Visualizing the data matrix in this way can help to find the variables(selected Enhancer's) that appear to be characteristic for each sample cluster.

13. Heatmap

A heatmap is a plot of rectangular data as a color-encoded matrix.This is a great way to visualize data, because it can show the relation between variables (selected Enhancer's) including genes. The heatmap here shows the relation between the first 100 genes and the selected Enhancer's values to reduce time and complexity .

For further details on these features please go through the below tool :
https://jbrowse.maizegdb.org/