Title:Comparison of Kernel and Decision Tree-based Algorithms for the Prediction of microRNAs Associated with Cancer
Volume: 11
Issue: 1
Author(s): Ram Kothandan and Sumit Biswas
Affiliation:
Keywords:
AUC, RFE, Cost-sensitive, Class imbalance, Thermodynamics of miRNA-mRNA binding.
Abstract: The discovery of microRNAs (miRs) in the 1990's spawned a genre of research which has
thrown light on the involvement of these small non-coding RNAs in several developmental pathways
and diseases, one of which happens to be cancer. While algorithms which predict the binding of
miRNAs to their targets are abundant, the same is not true for the association of miRNAs to targets
which can be implicated in cancer. Machine learning approaches, which have been implemented in
target prediction need to be extrapolated with proper feature selection to reach an acceptable level of
accuracy in the prediction of associations of miRNAs to cancer. In this study we present a comparison
of three different learning algorithms viz., the kernel-based Support Vector Machines (SVM), Decision Tree-based
Random Forest (RF) and C4.5 to predict miRNAs associated with cancer. 60 informative features were extracted from a
dataset of experimentally validated miRNA based on sequence, thermodynamics of miRNA-mRNA binding and their
hybridization. Initially, features were ranked based on F-score and a two-stage Recursive Feature Elimination (RFE)
process was employed to select the optimal subset of features for individual classifier. Class imbalance in the training set
was overcome by employing cost-sensitive approach. The performance of each individual learning algorithm was
evaluated in terms of precision, recall, F-measure and AUC. Subsequently, the learning algorithm with better performance
measure would be utilized for constructing a two-step binary classifier viz., miRSEQ and miRINT, which will identify a
miRNA to be associated with the cancer pathway. Based on our comparative analysis, it was evident that the decision tree
based RF model performed well in terms of better precision and AUC (for miRSEQ), but was moderate (for miRINT).