Please note: Plot (Categorical Bar chart) comparing the candidate genes with the other maize genes is generated using a randomly downsampled set of other maize genes. The size of the randomly downsampled genes is equal to the size of the user defined candidate genes.

Example Candidate gene list:

To demonstrate the potential use case of the user candidate gene analysis, we gathered a set of fifty stress genes that were differentially expressed between the control and salt stress samples and used them to identify unique characteristics common among salt stress genes .

Please go through the below explanation for details on table and plot interpretations

Table column interpretation:

Subcellular Localization by WoLF PSORT

WoLF PSORT is an extension of the PSORT II program for protein subcellular localization prediction, which is based on the PSORT principle. WoLF PSORT converts a protein's amino acid sequences into numerical localization features; based on sorting signals, amino acid composition and functional motifs. After conversion, a simple k-nearest neighbor classifier is used for prediction.Further details about the programme can be found here:
https://academic.oup.com/nar/article/35/suppl_2/W585/2920788
The first five results(a list of proteins of known localization with the most similar localization features to the query) from WoLF PSORT output is displayed in the table as:

1. Subcel1

2. Subcel2

3. Subcel3

4. Subcel4

5. Subcel5

Subcellular Localization by DeepLoc

DeepLoc, a prediction algorithm using deep neural networks to predict protein subcellular localization relying only on sequence information. At its core, the prediction model uses a recurrent neural network that processes the entire protein sequence and an attention mechanism identifying protein regions important for the subcellular localization.Further details about the programme can be found here:
https://academic.oup.com/bioinformatics/article/33/21/3387/3931857
DeepLoc can differentiate between 10 different localizations: Nucleus, Cytoplasm, Extracellular, Mitochondrion, Cell membrane, Endoplasmic reticulum, Chloroplast, Golgi apparatus, Lysosome/Vacuole and Peroxisome. The probabilities of each localization for the query gene sequence as well as the likelihood of the protein sequence being soluble or membrane is displayed in the table as:

7. Type (soluble/membrane)

8. Localization (maximum probability localization)

9. Nucleus

10. Cytoplasm

11. Extracellular

12. Mitochondrion

13. Cell membrane

14. Endoplasmic reticulum

15. Plastid

16. Golgi apparatus

17. Lysosome/Vacuole

18. Peroxisome

19. Labels

The labels are the classes or the groups the genes are mapped into.The labels can act as both target variable or feature as per the need of the user for solving their specific problem

19.1 No Label

This selection is provided to enable users to view the properties of all genes without labeling them into different gene categories or annotations. This is to let users examine the features of multiple genes and identify common patterns among them. As it involves the inspection of all the genes therefore they work only for "Submit for analysis" button .

19.2 Classical Genes

Classical genes can be defined as the most well-studied genes mainly for their visible mutant phenotype (for example: liguleless3).

19.3 Pan-genome Genes

A gene in a given taxonomic group is either present in every individual (core), or absent in at least a single individual (dispensable).

19.4 Origin Genes

Gene duplication is an important evolutionary mechanism allowing new genetic material and thus opportunities to acquire new gene functions for an organism. There are different origins of duplications such as whole-genome duplications, tandems, etc.

Graph interpretations:

To the top right corner of the plots/graphs, there are options to download plot, zoom-out/zoom in, reset axes, autoscale, toggle spike lines, show closest data on hover, compare data on hover, box select, pan and lasso. Users can also select specific legends to view data only for the selected legends. Details on the interactive plot options are available here:
Interactive graph features

1. Categorical marginal Plot

The Categorical marginal Plots are bar plots showing the frequency distribution of the different categories of the selected gene features alonghwith higlighting the candidate genes. This plot will enable the user to easily identify where thier candidate gene lie among the other maize genes for the selected feature.

2. Categorical Bar chart

The Categorical Bar chart shows the frequency distribution of the different categories in the selected Protein Localization features such as Subcel1. The X-axis in the Categorical bar chart represents the different categories in the selected protein structure. The Y-axis represents the frequency of the categories in the selected Protein Localization. In addition to the graph,to increase the interpretability of the data, We have also included P-values, mean and standard deviations of the selected datasets.

3. K-mode bar plot

KModes clustering is one of the unsupervised Machine Learning algorithms that is used to cluster categorical variables.
Clustering is an unsupervised learning method whose task is to divide the population or data points into a number of groups, such that data points in a group are more similar to other data points in the same group and dissimilar to the data points in other groups. It is basically a collection of objects based on similarity and dissimilarity between them.
In the K-mode bar plot, we divided our observations for the selected protein localizations into K-clusters.
The X-axis in the K-mode bar plot represents the K-clusters within each categories of the selected protein localizations.
The Y-axis represents the frequency of the K-clusters within each categories in the selected protein localizations.