Title:ORFpred: A Machine Learning Program to Identify Translatable Small Open Reading Frames in Intergenic Regions of the Plasmodium falciparum Genome
Volume: 11
Issue: 2
Author(s): Vivek Srinivas, Mayank Kumar, Santosh Noronha and Swati Patankar
Affiliation:
Keywords:
Small open reading frames, upstream open reading frames, translatability, low molecular weight proteins, post
transcriptional gene regulation, Plasmodium falciparum, AT rich genome.
Abstract: Motivation: Small Open Reading Frames (smORFs) are involved in a variety of cellular
processes varying from metabolism to gene regulation and eukaryotic genomes have been predicted to
contain a large number of smORFs. Only a meager 174 smORFs have been annotated in the genome
of the human malaria parasite Plasmodium falciparum. Although millions of smORFs can be extracted
from the parasite genome, the identification of translatable smORFs from the P. falciparum genome is
a challenging task due to low accuracy of existing smORF predictors when applied to an AT biased genome.
Result: We developed ORFpred, a machine learning algorithm which calculates the probability of translation initiation and
elongation of ORFs in the P. falciparum genome. ORFpred identified 2204 translatable smORFs and when compared to
available predictors, showed higher accuracy. We believe that ORFpred will help in identification of probable protein
coding smORFs in other eukaryotic genomes.
Availability and Implementation: Database used for training and testing the algorithm and source codes are freely
available at http://www.bio.iitb.ac.in/~patankar/software/ORFpred.