Please go through the below explanation for details on table and plot interpretations
Table column interpretation:
Subcellular Localization by WoLF PSORT
WoLF PSORT is an extension of the PSORT II program for protein subcellular localization prediction,
which is based on the PSORT principle. WoLF PSORT converts a protein's amino acid sequences into
numerical localization features; based on sorting signals, amino acid composition and functional
motifs. After conversion, a simple k-nearest neighbor classifier is used for prediction.Further
details about the programme can be found here:
https://academic.oup.com/nar/article/35/suppl_2/W585/2920788
The first five results(a list of proteins of known localization with the most similar
localization features to the query) from WoLF PSORT output is displayed in the table as:
1. Subcel1
2. Subcel2
3. Subcel3
4. Subcel4
5. Subcel5
Subcellular Localization by DeepLoc
DeepLoc, a prediction algorithm using deep neural networks to predict protein subcellular localization
relying only on sequence information. At its core, the prediction model uses a recurrent neural
network that processes the entire protein sequence and an attention mechanism identifying protein
regions important for the subcellular localization.Further
details about the programme can be found here:
https://academic.oup.com/bioinformatics/article/33/21/3387/3931857
DeepLoc can differentiate between 10 different localizations: Nucleus, Cytoplasm, Extracellular,
Mitochondrion,
Cell membrane, Endoplasmic reticulum, Chloroplast, Golgi apparatus, Lysosome/Vacuole and Peroxisome.
The probabilities of each localization for the query gene sequence as well as the likelihood of the protein sequence
being soluble or membrane is displayed in the table as:
7. Type (soluble/membrane)
8. Localization (maximum probability localization)
9. Nucleus
10. Cytoplasm
11. Extracellular
12. Mitochondrion
13. Cell membrane
14. Endoplasmic reticulum
15. Plastid
16. Golgi apparatus
17. Lysosome/Vacuole
18. Peroxisome
19. Labels
The labels are the classes or the groups the genes are mapped into.The labels can act as both target variable or feature as per the need of the user for solving their specific problem
19.1 No Label
This selection is provided to enable users to view the properties of all genes without labeling them into different gene categories or annotations. This is to let users examine the features of multiple genes and identify common patterns among them. As it involves the inspection of all the genes therefore they work only for "Submit for analysis" button .
19.2 Classical Genes
Classical genes can be defined as the most well-studied genes mainly for their visible mutant phenotype (for example: liguleless3).
19.3 Pan-genome Genes
A gene in a given taxonomic group is either present in every individual (core), or absent in at least a single individual (dispensable).
19.4 Origin Genes
Gene duplication is an important evolutionary mechanism allowing new genetic material and thus opportunities to acquire new gene functions for an organism. There are different origins of duplications such as whole-genome duplications, tandems, etc.
Graph interpretations:
To the top right corner of the plots/graphs, there are options to download plot, zoom-out/zoom in, reset axes, autoscale, toggle spike lines, show closest data on hover, compare data on hover, box select, pan and lasso. Users can also select specific legends to view data only for the selected legends. Details on the interactive plot options are available here:
Interactive graph features
1. Categorical Bar chart
The Categorical Bar chart shows the frequency distribution of the different categories in the selected Protein Localization features such as Subcel1. The X-axis in the Categorical bar chart represents the different categories in the selected protein structure. The Y-axis represents the frequency of the categories in the selected Protein Localization. In addition to the graph,to increase the interpretability of the data, We have also included P-values, mean and standard deviations of the selected datasets.
2. K-mode bar plot
KModes clustering is one of the unsupervised Machine Learning algorithms that is used to cluster
categorical variables.
Clustering is an unsupervised learning method whose task is to divide the population or data points into
a number of groups, such that data points in a group are more similar to other data points in the
same group and dissimilar to the data points in other groups. It is basically a collection of
objects based on similarity and dissimilarity between them.
In the K-mode bar plot, we divided our observations for the selected protein localizations into K-clusters.
The X-axis in the K-mode bar plot represents the K-clusters within each categories of the selected protein localizations.
The Y-axis represents the frequency of the K-clusters within each categories in the selected protein localizations.