Welcome to Maize Feature Store!

The Maize feature store is a centralized repository of updated, raw and transformed data for solving complex biological problems associated with the maize genome. Of the many biological problems , one of them are the classification problems such as genotype-to-phenotype classifiers, essential or non-essential gene classifier and many more that are based on some standard feature sets such as gene length, exon number, gene expression etc.
These standard feature sets are being used time and again to solve different biological problems and it takes a significant amount of time to extract these features from raw data for example mapping gff files with genes of interest or extracting counts of exons per gene model. These process becomes tedious and repetitive if we are using the common set of features for solving other biological problems, so we thought why not produce a standardarized database to collect different types of features from a variety of sources and tools.

Therefore the Maize feature store aims at the following:

  1. Aggregating and preprocessing multiple datasets from different databases and papers that can be handy for both bioinformaticians and wetlab researchers to solve their problem of interest.
  2. Enable better models to be created faster with reusable features.
  3. Make model and feature governance possible for explainability and transparency.
  4. Combining features into training data.
  5. Provide labels (target biological problems that users try to predict) associated with the maize genome.
  6. Provide user interface to visualize the data and allow users to do custom exploratory analysis based on the features selected such as distribution graph to map the distribution of a feature(gene length) across the entire genome or GC-content window.
  7. Provide with pre-built machine learning models focusing on some of the target problems stored in the database.