Please go through the below explanation for details on table and plot interpretations
Table column interpretation:
Histone modifications are often found in recurring combinations at promoters, enhancers and repressed regions. These combinations are referred to as ‘chromatin states’ and can be used to annotate regulatory regions in genomes. We have discovered the following chromatin states from chip-seq data of two different tissue type ( Maize Leaf and Ear )using the tool ChromHMM (A multi-variate HMM for chromatin combinatorics). :
All the Chromatin States generated by ChromHMM can be further annotated as enhancers or repressors depending on their location in the genome such as genebodies activation and so on. Feel free to update the annotations as per your interpretation .
For additional information on the ChromHMM outputs on annotations and enrichment follow the below links:
- Leaf Chromatin States
- Leaf Fold Enrichment
- Leaf TSS Fold Enrichment
- Leaf TSS Fold Enrichment
- Ear Chromatin States
- Ear TSS Fold Enrichment
- Ear TSS Fold Enrichment
- Ear TES Fold Enrichment
Leaf Chromatin States
The Leaf Chromatin States plot displays the different chromatin states identified by ChromHMM on the y-axis and
the various histone modification from the Leaf chip-seq data on the x-axis .It displays a heatmap of the emission
parameters in which each row corresponds to a different state, and each column corresponds to a different histone mark.
The darker blue color corresponds to a greater probability of observing the mark in the state.
Ear Chromatin States
The Ear Chromatin States plot displays the different chromatin states identified by ChromHMM on the y-axis and
the various histone modification from the Ear chip-seq data on the x-axis .It displays a heatmap of the emission
parameters in which each row corresponds to a different state, and each column corresponds to a different histone mark.
The darker blue color corresponds to a greater probability of observing the mark in the state.
Leaf fold enrichment plot
Leaf fold enrichment plot shows the enrichment of each state of a segmentation for various external genomic annotations in maize such as
gene coordinates, TSS (Transcription Start Site) coordinates, TES (Transcription End Site) coordinates,
CpG island coordinates, Exon coordinates and so on.
A darker blue color corresponds to a greater fold enrichment for a column-specific coloring scale.
Ear fold enrichment plot
Ear fold enrichment plot shows the enrichment of each state of a segmentation for various external genomic annotations in maize such as
gene coordinates, TSS (Transcription Start Site) coordinates, TES (Transcription End Site) coordinates,
CpG island coordinates, Exon coordinates and so on.
A darker blue color corresponds to a greater fold enrichment for a column-specific coloring scale.
The TSS and TES fold enrichment plots below display heatmap that shows the fold enrichment for each state
for each 200-bp bin position within 2 kb around a set of transcription start sites (TSSs) and transcription end sites (TESs)
respectively.
A darker blue color corresponds to a greater fold enrichment, and there is one color scale for the entire heatmap.
Leaf TES fold enrichment plot
Leaf TSS fold enrichment plot
Ear TES fold enrichment plot
Ear TSS fold enrichment plot
1. Ear (E2+E3+E4+E5+E8) Activation
The different combination of histone modifications found in the Ear tissue for states E2,E3,E4,E5,E8 are :
E2   :   H3K36me3 + H3K4me3
E3   :   H3K36me3
E4   :   H3K36me3 + H3K4me3 + H3K56ac
E5   :   H3K4me3 + H3K56ac
E8   :   H3K56ac
It has been found that all the above mentioned histone modifications in combination are mostly associated with activation in maize.
2. Ear (E1+E9)low signal
The different combination of histone modifications found in the Ear tissue for states E1 and E9 are :
E1   :   low signal
E9   :   low signal
Chromatin state E1 and E9 could not capture any histone modification,therefore they are termed as low signal.
3. Ear E6 Repression/Activation
The different combination of histone modifications found in the Ear tissue for states E6 are :
E6   :   H3K4me3 +H3K27me3
It has been found that the above mentioned histone modifications are linked with both activation or repression under different
circumstances in maize.
4. Ear E7 Repression
The different combination of histone modifications found in the Ear tissue for states E7 are :
E7   :   H3K27me3
It has been found that the above mentioned histone modification is linked with repression in maize.
5. Leaf E5 Activation
The different combination of histone modifications found in the Leaf tissue for state E5 is :
E5   :   H2AZ + H3K4me1
It has been found that the above histone modifications are mostly associated with activation in maize.
6. Leaf E4 Activation
The different combination of histone modifications found in the Leaf tissue for state E4 is :
E4   :   H3K27ac + H2AZ + H3K4me1 + H3K56ac
It has been found that the above histone modifications are mostly associated with activation in maize.
7. Leaf (E2+E8+E9) Activation
The different combination of histone modifications found in the Ear tissue for states E2,E8,E9 are :
E2   :   H3K56ac + H3K27ac + H3K4me3
E8   :   H3K36me3 + H3K4me1
E9   :   H3K4me1 + H3K36me3
It has been found that all these histone modifications in combination are mostly associated with activation in maize.
8. Leaf E1 Activation
The different combination of histone modifications found in the Leaf tissue for state E1 is :
E1   :   H3K27ac + H3K4me3 + H3K56ac + H3K9ac + H3K36me3
It has been found that the above histone modifications are mostly associated with activation in maize.
9. Leaf E11 Activation
The different combination of histone modifications found in the Leaf tissue for state E11 is :
E11   :   H3 + H3K27ac
It has been found that the above histone modifications are mostly associated with activation in maize.
10. Leaf E3 Activation
The different combination of histone modifications found in the Leaf tissue for state E3 is :
E3   :   H3K27ac
It has been found that the above histone modifications are mostly associated with activation in maize.
11. Leaf E10 Activation
The different combination of histone modifications found in the Leaf tissue for state E10 is :
E10   :   H3K4me1
It has been found that the above histone modifications are mostly associated with activation in maize.
12. Leaf E7 Low signal
The different combination of histone modifications found in the Leaf tissue for states E7 is :
E7   :   low signal
Chromatin state E7 could not capture any histone modification,therefore it is termed as low signal.
13. Leaf E6 Repression
The different combination of histone modifications found in the Leaf tissue for state E6 is :
E6   :   H3K27me3 + H2AZ
It has been found that the above mentioned histone modification is linked with repression in maize.
14. Labels
The labels are the classes or the groups the genes are mapped into.The labels can act as both target variable or feature as per the need of the user for solving their specific problem
14.1 No Label
This selection is provided to enable users to view the properties of all genes without labeling them into different gene categories or annotations. This is to let users examine the features of multiple genes and identify common patterns among them. As it involves the inspection of all the genes therefore they work only for "Submit for analysis" button .
14.2 Classical Genes
Classical genes can be defined as the most well-studied genes mainly for their visible mutant phenotype (for example: liguleless3).
14.3 Pan-genome Genes
A gene in a given taxonomic group is either present in every individual (core), or absent in at least a single individual (dispensable).
14.4 Origin Genes
Gene duplication is an important evolutionary mechanism allowing new genetic material and thus opportunities to acquire new gene functions for an organism. There are different origins of duplications such as whole-genome duplications, tandems, etc.
Graph interpretations:
To the top right corner of the plots/graphs, there are options to download plot, zoom-out/zoom in, reset axes, autoscale, toggle spike lines, show closest data on hover, compare data on hover, box select, pan and lasso. Users can also select specific legends to view data only for the selected legends. Details on the interactive plot options are available here:
Interactive graph features
1. Histogram
The Histogram shows the frequency distribution of the selected chromatin state such as Ear E6 Repression/Activation, Ear E7 Repression, Leaf E1 Activation etc. The X-axis in the histogram represents the range of values present in the selected chromatin state. The Y-axis represents the frequency of values. In addition to the graph,to increase the interpretability of the data, We have also included P-values,mean and standard deviations of the selected datasets.
2. Count and distribution
The Count and distribution plot is a smoothed, continuous version of a histogram estimated from the data. The most common form of estimation is known as kernel density estimation.The x-axis is the value of the selected chromatin state just like in a histogram and the y-axis in a distribution plot is the probability density function and not a probability. The difference is the probability density is the probability per unit on the x-axis. In general the y-axis on the distribution plot is a value only for relative comparisons between different categories like classical and other genes.
3. Pair plots
The pairs plot builds on two basic figures, the distributions and the scatter plot. The distributions on the diagonal allows us to see the distribution of a single selected chromatin state while the scatter plots on the upper and lower triangles show the relationship (or lack thereof) between two variables such as Ear E7 Repression and Leaf E1 Activation.
3. Box plots
Boxplots are a measure of how well distributed the data in the selected chromatin state's are. It divides the data set
into three quartiles. This graph represents the minimum, maximum, median, first quartile and third quartile in the
selected chromatin state. It is also useful in comparing the distribution of data across data sets by drawing
boxplots for each of them such as the core, non-core, dispensable and private genes. Boxplots can be used to:
- Identify outliers or anomalous data points
- To determine if our data is skewed
- To understand the spread/range of the data
4. Violin plots
Violin plots are similar to box plots, except that they also show the probability density of the selected chromatin state
at different values. These plots include a marker for the median of the data and a box indicating the
interquartile range, as in the standard box plots. Overlaid on this box plot is a kernel density estimation.
Like box plots, violin plots are used to represent comparison of a variable distribution such as
Ear E7 Repression, Leaf E1 Activation
(or sample distribution) across different "categories" (Classical/Other or Core/Non-core) .
A violin plot is more informative than a plain box plot. In fact while a box plot only shows summary statistics
such as mean/median and interquartile ranges, the violin plot shows the full distribution of the selected
chromatin state.
5. Joint plots
A Jointplot comprises three plots. Out of the three, one plot displays a bivariate graph(scatter plot) which shows how the
one variable( such as Ear E7 Repression) varies with the another variable(such as Leaf E1 Activation). Another plot is
placed horizontally at the top of the bivariate graph and it shows the distribution in the form of
marks along an axis for each one of the selected chromatin state's (for example rug plot of Ear E7 Repression and Leaf E1 Activation).
The third plot is placed on the right margin of the bivariate graph with the orientation set to vertical and
it shows the distribution of again the two selected chromatin state's .
It is very helpful to have univariate and bivariate plots together in one figure.
This is because the univariate analysis focuses on one variable, it describes, summarizes and shows any
patterns in your data and the bivariate analysis explores the relationship between two variables and also
describes the strength of their relationship.
6. Scatter plots
A Scatter plot is a great way of exploring relationships or patterns in data. But adding a regression line can make those patterns stand out . Therefore Scatter plot with simple linear regression for the selected chromatin state's, explains the strength of the relationship between the two variables such as Ear E7 Repression and Leaf E1 Activation in your scatter-plot using R2, the squared correlation coefficient.It is always between 0 and 1. Higher R2 indicates stronger relationship between the selected chromatin state's .
7. Correlation plots
Correlation heatmap is graphical representation of correlation matrix representing correlation between different selected chromatin state's. Correlation ranges from -1 to +1. Values closer to zero means there is no linear trend between the two variables. The closer to 1 the correlation is the more positively correlated they are; that is as one increases so does the other and the closer to 1 the stronger this relationship is. Correlation plots also alerts us to potential multicollinearity problems.
8. Downsampled Dendrogram plots
A dendrogram is a type of tree diagram showing hierarchical clustering — relationships between similar sets of
genes based on the selected chromatin state's. They are frequently used in biology to show clustering between genes
or samples, but they can represent any type of grouped data.The dendrogram is built on downsampled data to save
time and complexity.
The branches in the dendrogram are arranged according to how similar (or dissimilar) they are. branches that are
close to the same height are similar to each other; branches with different heights are dissimilar —
the greater the difference in height, the more dissimilarity. Also the different clusters of genes based on the
selected chromatin state's are marked by different colors.
9. Downsampled Hierarchical Scatterplot
The Hierarchical scatter plot is a type of pair plot that can be used to visualize the relationship of different pairs of chromatin state's on the clusters of genes formed during dendogram.Therefore to create Hierarchical scatterplot we need to input the number of clusters in the choose cluster input box. For the Downsampled Hierarchical Scatter plot, we need to specify the number of clusters we want to view. Moreover, this number of clusters corresponds to the number of clusters formed in the Dendrogram plot, so we perform this analysis only after studying the Dendrogram plot. The Downsampled Hierarchical Scatter plot is dynamically sized for the number of clusters, so every user can analyze the dataset based on how many clusters they believe the selected data is forming .
10. Downsampled Hierarchical Heatmap
Hierarchical clustering heatmap is an intuitive way to visualize information from complex data. It’s also called
a false colored image, where data values are transformed to color scale. Heat maps allow us to simultaneously
visualize clusters of samples(selected chromatin state's) in the column and features (genes) in the rows.
First
hierarchical clustering is done of both the rows and the columns of the data matrix. The columns/rows of the
data matrix are re-ordered according to the hierarchical clustering result, putting similar observations close
to each other. The blocks of ‘high’ and ‘low’ values are adjacent in the data matrix. Finally,
a color scheme is applied for the visualization and the data matrix is displayed. Visualizing the data
matrix in this way can help to find the variables(selected chromatin state's) that appear to be characteristic
for each sample cluster.
10. Heatmap
A heatmap is a plot of rectangular data as a color-encoded matrix.This is a great way to visualize data, because it can show the relation between variables (selected chromatin state's) including genes. The heatmap here shows the relation between the first 100 genes and the selected chromatin state's values to reduce time and complexity .
PCA
PCA reduces the high-dimensional interrelated data to low-dimension by linearly transforming the old variable
into a new set of uncorrelated variables called principal component (PC) while retaining the most possible
variation.
The first component has the largest variance followed by the second component and so on.
The first few components retain most of the variation, which is easy to visualize and summarize the features
of original high-dimensional datasets in low-dimensional space. PCA helps to assess which original samples are
similar and different from each other.
In our case when the selected number of chromatin state's are higher than 2 then, it is arduous to visualize them at
the same time to interpret the genes.PCA transforms them into a new set of variables (PCs) with top PCs
having the highest variation. PCs are ordered which means that the first few PCs (generally first 3 PCs but
can be more) contribute most of the variance present in the original high-dimensional dataset. These top
first 2 or 3 PCs can be plotted easily and summarize the features of all original variables (selected chromatin state's).
11. PCA 2D variables cluster
A PCA 2D variables cluster plot shows how strongly each characteristic or variable influences a principal component. In the plot we can see these vectors(variables) are pinned at the origin of PCs (PC1 = 0 and PC2 = 0). Their project values on each PC show how much weight they have on that PC.
12. PCA 2D observation cluster
A PCA 2D observation cluster plot showing clusters of samples/observations based on their similarity.PCA does not discard
any samples or observations. Instead, it reduces the overwhelming number of dimensions by
constructing principal components (PCs). PCs describe variation and account for the varied influences
of the original characteristics. Such influences, or loadings, can be traced back from the PCA plot
to find out what produces the differences among clusters.
Observations further out are either outliers or naturally extreme observations. Plot observations are annotated with shapes
and colors to highlight gene models and gene types for example (Classical/Others). In this way, we can easily hover
over any data point, especially outliers, to find gene models with extreme observation values across various sample types.
(selected chromatin state's)
13. PCA 2D biplot cluster
PCA biplot = PCA observations plot + variable plot.
we can probably notice that a PCA biplot simply merges an usual observations PCA plot with a plot of
variables. The arrangement is like this:
*Bottom axis: PC1 score.
* Left axis: PC2 score.
In other words, the left and bottom axes are of the PCA observations plot — use them to read PCA scores of the
observations (dots).
Another nice thing about these plots: the angles between the vectors/variables tell us how
characteristics or variables correlate with one another.
* When two vectors are close, forming a small angle, the two variables they represent are positively
correlated.
*If they meet each other at 90°, they are not likely to be correlated.
*When they diverge and form a large angle (close to 180°), they are negatively correlated.
14. PCA 3D variables cluster
This is similar to the PCA 2D variables cluster plot but now instead of two principal components, we can have three PC'S that contribute to most of the variance present in the original high-dimensional dataset. These top 3 PCs can be plotted easily in 3 dimensional space and summarize the features of all original variables (selected chromatin state's) .
15. PCA 3D observation cluster
This is similar to the PCA 2D observation cluster but now instead of two principal components, we can have three PC'S that contribute to
most of the variance present in the original high-dimensional dataset. These top
3 PCs can be plotted easily in 3 dimensional space and summarize the variance in the observations .
A PCA 3D observation cluster plot displays how much variation each principal component captures from
the data. If the first three PCs are sufficient to describe the essence of the data.
http://compbio.mit.edu/ChromHMM/