|
Core Gene Prediction Model |
Home |
Random Forest model is developed based on top 25 sequence features and gene structural features that classify whether the Gene is Core or Non-Core.
For classifying the Genes, users are requested to submit their data in the form as per the value range
provided in the input placeholder or autofill the required protein and DNA sequence feature by entering the protein
and the coding sequence in the input boxes.
[Note: For predicted value, please check the footer of the table.]
For more details on the predictions module, go through the
video tutorial link here: Open Video.
Enter your protein sequence for generating protein sequence features (example):
1. Calculate the gene structural features such as (3 Prime UTR Length, Isoforms Count,
5 prime UTR length, Average Exon Length and Canonical mRNA Length) using the customized
gff analyzer script: GFF Analyzer
Generate the required gene strutcural features running this command:
- python GFF_Extractor.py --gff_file Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.gff3 --meta_info_dir ./Meta_File_Output --output_dir ./GFF_File_Output
- --gff_file : GFF format file containing information on the genome of the organism being studied.
- --meta_info_dir : Directory to storing the meta informations generated on the fly while processing the GFF files.
- --output_dir : Directory to store the outputs from GFF files.
Please note: The gff analyzer script only works for gff files in a particular format. An
example format file is available in the github script link provided above,and the original data can be accessed through
MaizeGDB
2. Calculate the chromosomal distance using the customized chromosomal distance calculator
script: Distance Calcultor
Generate the required chromosomal distance feature running this command:
- python Structural_Distance.py --distance_file B73v5_knobs_centromeres_telomeres.bed --gene_info_file gene_coord.txt ----output_dir ./Distance_File_Output
- --distance_file : structural bed file containing information on the centomere, telomore and Knob information.
- --gene_info_file : File containing information about the gene coordinates.
- --output_dir : Directory to store the outputs containing information on distances of the genes from the centomere, telomore and Knob.
Please note: An example structural bed file is available in the github script link provided above, and
the original data can be accessed through MaizeGDB Jbrowse