Title:Heuristic Analysis of Genomic Sequence Processing Models for High Efficiency Prediction: A Statistical Perspective
Volume: 23
Issue: 5
Author(s): Aditi R. Durge, Deepti D. Shrimankar*Ankush D. Sawarkar
Affiliation:
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India
Keywords:
Machine learning, genome processing, classification, computational complexity, deep learning, precision and recall.
Abstract: Genome sequences indicate a wide variety of characteristics, which include species and
sub-species type, genotype, diseases, growth indicators, yield quality, etc. To analyze and study the
characteristics of the genome sequences across different species, various deep learning models have
been proposed by researchers, such as Convolutional Neural Networks (CNNs), Deep Belief Networks
(DBNs), Multilayer Perceptrons (MLPs), etc., which vary in terms of evaluation performance, area of
application and species that are processed. Due to a wide differentiation between the algorithmic implementations,
it becomes difficult for research programmers to select the best possible genome processing
model for their application. In order to facilitate this selection, the paper reviews a wide variety
of such models and compares their performance in terms of accuracy, area of application, computational
complexity, processing delay, precision and recall. Thus, in the present review, various deep
learning and machine learning models have been presented that possess different accuracies for different
applications. For multiple genomic data, Repeated Incremental Pruning to Produce Error Reduction
with Support Vector Machine (Ripper SVM) outputs 99.7% of accuracy, and for cancer genomic
data, it exhibits 99.27% of accuracy using the CNN Bayesian method. Whereas for Covid genome
analysis, Bidirectional Long Short-Term Memory with CNN (BiLSTM CNN) exhibits the highest accuracy
of 99.95%. A similar analysis of precision and recall of different models has been reviewed.
Finally, this paper concludes with some interesting observations related to the genomic processing
models and recommends applications for their efficient use.