Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

COVID-19 Biomarkers Recognition & Classification Using Intelligent Systems

Author(s): Javier Bajo-Morales*, Juan Carlos Prieto-Prieto, Luis Javier Herrera, Ignacio Rojas and Daniel Castillo-Secilla

Volume 17, Issue 5, 2022

Published on: 13 May, 2022

Page: [426 - 439] Pages: 14

DOI: 10.2174/1574893617666220328125029

Price: $65

Abstract

Background: SARS-CoV-2 has paralyzed mankind due to its high transmissibility and its associated mortality, causing millions of infections and deaths worldwide. The search for gene expression biomarkers from the host transcriptional response to infection may help understand the underlying mechanisms by which the virus causes COVID-19. This research proposes a smart methodology integrating different RNA-Seq datasets from SARS-CoV-2, other respiratory diseases, and healthy patients.

Methods: The proposed pipeline exploits the functionality of the ‘KnowSeq’ R/Bioc package, integrating different data sources and attaining a significantly larger gene expression dataset, thus endowing the results with higher statistical significance and robustness in comparison with previous studies in the literature. A detailed preprocessing step was carried out to homogenize the samples and build a clinical decision system for SARS-CoV-2. It uses machine learning techniques such as feature selection algorithm and supervised classification system. This clinical decision system uses the most differentially expressed genes among different diseases (including SARS-Cov-2) to develop a four-class classifier.

Results: The multiclass classifier designed can discern SARS-CoV-2 samples, reaching an accuracy equal to 91.5%, a mean F1-Score equal to 88.5%, and a SARS-CoV-2 AUC equal to 94% by using only 15 genes as predictors. A biological interpretation of the gene signature extracted reveals relations with processes involved in viral responses.

Conclusion: This work proposes a COVID-19 gene signature composed of 15 genes, selected after applying the feature selection ‘minimum Redundancy Maximum Relevance’ algorithm. The integration among several RNA-Seq datasets was a success, allowing for a considerable large number of samples and therefore providing greater statistical significance to the results than in previous studies. Biological interpretation of the selected genes was also provided.

Keywords: COVID-19, RNA-Seq, machine learning, feature selection, gene signature, WHO.

Graphical Abstract
[1]
WHO coronavirus (COVID-19) dashboard. Available from:. https://covid19.who.int/
[2]
COVID-19 map - johns Hopkins coronavirus resource center. Available from:. https://coronavirus.jhu.edu/map.html
[3]
He X, Lau EHY, Wu P, et al. Temporal dynamics in viral shedding and transmissibility of COVID-19. Nat Med 2020; 26(5): 672-5.
[http://dx.doi.org/10.1038/s41591-020-0869-5] [PMID: 32296168]
[4]
Cheng H-Y, Jian S-W, Liu D-P, et al. High transmissibility of COVID-19 near symptom onset bioRxiv 2020.
[http://dx.doi.org/10.1101/2020.03.18.20034561]
[5]
Rothe C, Schunk M, Sothmann P, et al. Transmission of 2019-nCoV infection from an asymptomatic contact in germany. N Engl J Med 2020; 382(10): 970-1.
[http://dx.doi.org/10.1056/NEJMc2001468] [PMID: 32003551]
[6]
Wu C, Chen X, Cai Y, et al. Risk factors associated with acute respiratory distress syndrome and death in patients with Coronavirus dis-ease 2019 pneumonia in Wuhan, China. JAMA Intern Med 2020; 180(7): 934-43.
[http://dx.doi.org/10.1001/jamainternmed.2020.0994] [PMID: 32167524]
[7]
Williamson E, Walker AJ, Bhaskaran K, Bacon S, Bates C. The OpenSAFELY Collaborative. OpenSAFELY: Factors associated with COVID-19-related hospital death in the linked electronic health records of 17 million adult NHS patients bioRxiv 2020.
[http://dx.doi.org/10.1101/2020.05.06.20092999]
[8]
Mueller AL, McNamara MS, Sinclair DA. Why does COVID-19 disproportionately affect older people? Aging (Albany NY) 2020; 12(10): 9959-81.
[http://dx.doi.org/10.18632/aging.103344] [PMID: 32470948]
[9]
Guan W-J, Ni Z-Y, Hu Y, et al. China Medical Treatment Expert Group for Covid-19. Clinical characteristics of Coronavirus disease 2019 in China. N Engl J Med 2020; 382(18): 1708-20.
[http://dx.doi.org/10.1056/NEJMoa2002032] [PMID: 32109013]
[10]
Langelier C, Kalantar KL, Moazed F, et al. Integrating host response and unbiased microbe detection for lower respiratory tract infection diagnosis in critically ill adults. Proc Natl Acad Sci USA 2018; 115(52): E12353-62.
[http://dx.doi.org/10.1073/pnas.1809700115] [PMID: 30482864]
[11]
Mick E, Kamm J, Pisco AO, et al. Upper airway gene expression reveals suppressed immune responses to SARS-CoV-2 compared with other respiratory viruses. Nat Commun 2020; 11(1): 5854.
[http://dx.doi.org/10.1038/s41467-020-19587-y] [PMID: 33203890]
[12]
Chang EH, Willis AL, Romanoski CE, et al. Rhinovirus infections in individuals with asthma increase ACE2 expression and cytokine pathways implicated in COVID-19. Am J Respir Crit Care Med 2020; 202(5): 753-5.
[http://dx.doi.org/10.1164/rccm.202004-1343LE] [PMID: 32649217]
[13]
Lieberman NAP, Peddu V, Xie H, et al. In vivo antiviral host transcriptional response to SARS-CoV-2 by viral load, sex, and age. PLoS Biol 2020; 18(9)e3000849
[http://dx.doi.org/10.1371/journal.pbio.3000849] [PMID: 32898168]
[14]
Ng DL, Granados AC, Santos YA, et al. A diagnostic host response biosignature for COVID-19 from RNA profiling of nasal swabs and blood. Sci Adv 2021; 7(6)eabe5984
[http://dx.doi.org/10.1126/sciadv.abe5984] [PMID: 33536218]
[15]
Andres-Terre M, McGuire HM, Pouliot Y, et al. Integrated, multi-cohort analysis identifies conserved transcriptional signatures across multiple respiratory viruses. Immunity 2015; 43(6): 1199-211.
[http://dx.doi.org/10.1016/j.immuni.2015.11.003] [PMID: 26682989]
[16]
Woods CW, McClain MT, Chen M, et al. A host transcriptional signature for presymptomatic detection of infection in humans exposed to influenza H1N1 or H3N2. PLoS One 2013; 8(1)e52198
[http://dx.doi.org/10.1371/journal.pone.0052198] [PMID: 23326326]
[17]
Ozsolak F, Milos PM. RNA sequencing: Advances, challenges and opportunities. Nat Rev Genet 2011; 12(2): 87-98.
[http://dx.doi.org/10.1038/nrg2934] [PMID: 21191423]
[18]
Wang C, Tan S, Liu W-R, et al. RNA-Seq profiling of circular RNA in human lung adenocarcinoma and squamous cell carcinoma. Mol Cancer 2019; 18(1): 134.
[http://dx.doi.org/10.1186/s12943-019-1061-8] [PMID: 31484581]
[19]
Castillo D, Galvez JM, Herrera LJ, et al. Leukemia multiclass assessment and classification from Microarray and RNA-seq technologies integration at gene expression level. PLoS One 2019; 14(2)e0212127
[http://dx.doi.org/10.1371/journal.pone.0212127] [PMID: 30753220]
[20]
Wang J, Dean DC, Hornicek FJ, Shi H, Duan Z. RNA sequencing (RNA-Seq) and its application in ovarian cancer. Gynecol Oncol 2019; 152(1): 194-201.
[http://dx.doi.org/10.1016/j.ygyno.2018.10.002] [PMID: 30297273]
[21]
Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics 2007; 23(19): 2507-17.
[http://dx.doi.org/10.1093/bioinformatics/btm344] [PMID: 17720704]
[22]
Lee C-P, Leu Y. A novel hybrid feature selection method for microarray data analysis. Appl Soft Comput 2011; 11(1): 208-13.
[http://dx.doi.org/10.1016/j.asoc.2009.11.010]
[23]
Aydadenta H, Adiwijaya A. A clustering approach for feature selection in microarray data classification using random forest. Journal of Information Processing Systems 2018; 14(5): 1167-75.
[24]
Townes FW, Hicks SC, Aryee MJ, Irizarry RA. Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol 2019; 20(1): 295.
[http://dx.doi.org/10.1186/s13059-019-1861-6] [PMID: 31870412]
[25]
Lu H, Chen J, Yan K, Jin Q, Xue Y, Gao Z. A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 2017; 256: 56-62.
[http://dx.doi.org/10.1016/j.neucom.2016.07.080]
[26]
Gálvez JM, Castillo D, Herrera LJ, et al. Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series. PLoS One 2018; 13(5)e0196836
[http://dx.doi.org/10.1371/journal.pone.0196836] [PMID: 29750795]
[27]
Ayyad SM, Saleh AI, Labib LM. Gene expression cancer classification using modified K-Nearest Neighbors technique. Biosystems 2019; 176: 41-51.
[http://dx.doi.org/10.1016/j.biosystems.2018.12.009] [PMID: 30611843]
[28]
van IJzendoorn DGP, Szuhai K, Briaire-de Bruijn IH, Kostine M, Kuijjer ML, Bovée JVMG. Machine learning analysis of gene expression data reveals novel diagnostic and prognostic biomarkers and identifies therapeutic targets for soft tissue sarcomas. PLOS Comput Biol 2019; 15(2)e1006826
[http://dx.doi.org/10.1371/journal.pcbi.1006826] [PMID: 30785874]
[29]
Yang L, Han Y, Jaffré F, et al. An immuno-cardiac model for macrophage-mediated inflammation in COVID-19 hearts. Circ Res 2021; 129(1): 33-46.
[http://dx.doi.org/10.1161/CIRCRESAHA.121.319060] [PMID: 33853355]
[30]
Gill SE, Dos Santos CC, O’Gorman DB, et al. Lawson COVID19 Study Team. Transcriptional profiling of leukocytes in critically ill COVID19 patients: Implications for interferon response and coagulation. Intensive Care Med Exp 2020; 8(1): 75.
[http://dx.doi.org/10.1186/s40635-020-00361-9] [PMID: 33306162]
[31]
Lee HK, Knabl L, Pipperger L, et al. Immune transcriptomes of highly exposed SARS-CoV-2 asymptomatic seropositive versus seronega-tive individuals from the Ischgl community. Sci Rep 2021; 11(1): 4243.
[http://dx.doi.org/10.1038/s41598-021-83110-6] [PMID: 33608566]
[32]
Bernardes JP, Mishra N, Tran F, et al. Longitudinal multi-omics analyses identify responses of megakaryocytes, erythroid cells, and plasmablasts as hallmarks of severe COVID-19. Immunity 2020; 53(6): 1296-1314.e9.
[http://dx.doi.org/10.1016/j.immuni.2020.11.017] [PMID: 33296687]
[33]
Jain R, Ramaswamy S, Harilal D, et al. Host transcriptomic profiling of COVID-19 patients with mild, moderate, and severe clinical out-comes. Comput Struct Biotechnol J 2020; 19: 153-60.
[http://dx.doi.org/10.1016/j.csbj.2020.12.016] [PMID: 33425248]
[34]
Castillo-Secilla D, Gálvez JM, Carrillo-Perez F, et al. KnowSeq R-Bioc package: The automatic smart gene expression tool for retrieving relevant biological knowledge. Comput Biol Med 2021; 133(104387)104387
[http://dx.doi.org/10.1016/j.compbiomed.2021.104387] [PMID: 33872966]
[35]
Walfish S. A review of statistical outlier methods. Pharm Technol 2006; 30(11): 82.
[36]
Fujita A, Sato JR, Demasi MAA, Sogayar MC, Ferreira CE, Miyano S. Comparing Pearson, Spearman and Hoeffding’s D measure for gene expression association analysis. J Bioinform Comput Biol 2009; 7(4): 663-84.
[http://dx.doi.org/10.1142/S0219720009004230] [PMID: 19634197]
[37]
Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 2002; 97(457): 77-87.
[http://dx.doi.org/10.1198/016214502753479248]
[38]
Massey FJ Jr. The kolmogorov-smirnov test for goodness of fit. J Am Stat Assoc 1951; 46(253): 68-78.
[http://dx.doi.org/10.1080/01621459.1951.10500769]
[39]
Smyth GK, Speed T. Normalization of cDNA microarray data. Methods 2003; 31(4): 265-73.
[http://dx.doi.org/10.1016/S1046-2023(03)00155-5] [PMID: 14597310]
[40]
Lazar C, Meganck S, Taminau J, et al. Batch effect removal methods for microarray gene expression data integration: A survey. Brief Bioinform 2013; 14(4): 469-90.
[http://dx.doi.org/10.1093/bib/bbs037] [PMID: 22851511]
[41]
Leek JT, Storey JD. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet 2007; 3(9): 1724-35.
[http://dx.doi.org/10.1371/journal.pgen.0030161] [PMID: 17907809]
[42]
Witten D, Tibshirani R. A comparison of fold-change and the t-statistic for microarray data analysis. Analysis 2007; 1776: 58-85.
[43]
Peng H, Long F, Ding C. Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 2005; 27(8): 1226-38.
[http://dx.doi.org/10.1109/TPAMI.2005.159] [PMID: 16119262]
[44]
Díaz-Uriarte R, Alvarez de Andrés S. Gene selection and classification of microarray data using random forest. BMC Bioinformatics 2006; 7: 3.
[http://dx.doi.org/10.1186/1471-2105-7-3] [PMID: 16398926]
[45]
Cristianini N, Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods 2000.
[http://dx.doi.org/10.1017/CBO9780511801389]
[46]
Cover T, Hart P. Nearest neighbor pattern classification. IEEE Trans Inf Theory 1967; 13(1): 21-7.
[http://dx.doi.org/10.1109/TIT.1967.1053964]
[47]
Breiman L. Random forests. Mach Learn 2001; 45(1): 5-32.
[http://dx.doi.org/10.1023/A:1010933404324]
[48]
Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res 2008; 9(11)
[49]
geo. Home - GEO - NCBI. Available from: . https://www.ncbi.nlm.nih.gov/geo/
[50]
Hu B, Huang S, Yin L. The cytokine storm and COVID-19. J Med Virol 2021; 93(1): 250-6.
[http://dx.doi.org/10.1002/jmv.26232] [PMID: 32592501]
[51]
Vishnubalaji R, Shaath H, Alajez NM. Protein coding and long noncoding RNA (lncRNA) transcriptional landscape in SARS-CoV-2 infect-ed bronchial epithelial cells highlight a role for interferon and inflammatory response. Genes (Basel) 2020; 11(7): 760.
[http://dx.doi.org/10.3390/genes11070760] [PMID: 32646047]
[52]
Tang B, Shojaei M, Wang Y, et al. PREDICT-19 consortium. Prospective validation study of prognostic biomarkers to predict adverse outcomes in patients with COVID-19: A study protocol. BMJ Open 2021; 11(1)e044497
[http://dx.doi.org/10.1136/bmjopen-2020-044497] [PMID: 33408218]
[53]
Huang L, Shi Y, Gong B, et al. Dynamic blood single-cell immune responses in patients with COVID-19. Signal Transduct Target Ther 2021; 6(1): 110.
[http://dx.doi.org/10.1038/s41392-021-00526-2] [PMID: 33677468]
[54]
Shaath H, Vishnubalaji R, Elkord E, Alajez NM. Single-cell transcriptome analysis highlights a role for neutrophils and inflammatory mac-rophages in the pathogenesis of severe COVID-19. Cells 2020; 9(11): 2374.
[http://dx.doi.org/10.3390/cells9112374] [PMID: 33138195]
[55]
Zhu L, Yang P, Zhao Y, et al. Single-cell sequencing of peripheral mononuclear cells reveals distinct immune response landscapes of COVID-19 and influenza patients. Immunity 2020; 53(3): 685-696.e3.
[http://dx.doi.org/10.1016/j.immuni.2020.07.009] [PMID: 32783921]
[56]
Loganathan T, Ramachandran S, Shankaran P, Nagarajan D, Mohan SS. Host transcriptome-guided drug repurposing for COVID-19 treatment: A meta-analysis based approach. PeerJ 2020; 8(e9357)e9357
[http://dx.doi.org/10.7717/peerj.9357] [PMID: 32566414]
[57]
Blot M, Jacquier M, Glele LA, et al. Pneumochondrie Study Group. Correction to: CXCL10 could drive longer duration of mechanical ventilation during COVID-19 ARDS. Crit Care 2021; 25(1): 143.
[http://dx.doi.org/10.1186/s13054-021-03559-9] [PMID: 33849612]
[58]
Zhang N, Zhao Y-D, Wang X-M. CXCL10 an important chemokine associated with cytokine storm in COVID-19 infected patients. Eur Rev Med Pharmacol Sci 2020; 24(13): 7497-505.
[PMID: 32706090]
[59]
Johnson HM, Lewin AS, Ahmed CM. SOCS, intrinsic virulence factors, and treatment of COVID-19. Front Immunol 2020; 11582102
[http://dx.doi.org/10.3389/fimmu.2020.582102] [PMID: 33193390]
[60]
Aydemir MN, Aydemir HB, Korkmaz EM, Budak M, Cekin N, Pinarbasi E. Computationally predicted SARS-COV-2 encoded microRNAs target NFKB, JAK/STAT and TGFB signaling pathways. Gene Rep 2021; 22(101012)101012
[http://dx.doi.org/10.1016/j.genrep.2020.101012] [PMID: 33398248]
[61]
Dabbagh D, He S, Hetrick B, Chilin L, Andalibi A, Wu Y. Identification of the SHREK family of proteins as broad-spectrum host antiviral factors. Viruses 2021; 13(5): 832.
[http://dx.doi.org/10.3390/v13050832] [PMID: 34064525]
[62]
Lu W, Liu X, Wang T, et al. Elevated MUC1 and MUC5AC mucin protein levels in airway mucus of critical ill COVID-19 patients. J Med Virol 2021; 93(2): 582-4.
[http://dx.doi.org/10.1002/jmv.26406] [PMID: 32776556]
[63]
Chatterjee M, van Putten JPM, Strijbis K. Defensive properties of mucin glycoproteins during respiratory infections-relevance for SARS-CoV-2. MBio 2020; 11(6): e02374-e20 [Internet]..
[http://dx.doi.org/10.1128/mBio.02374-20] [PMID: 33184103]
[64]
Arora S, Singh P, Dohare R, Jha R, Ali Syed M. Unravelling host-pathogen interactions: CeRNA network in SARS-CoV-2 infection (COVID-19). Gene 2020; 762(145057)145057
[http://dx.doi.org/10.1016/j.gene.2020.145057] [PMID: 32805314]
[65]
Domínguez-Iturza N, Lo AC, Shah D, Armendáriz M, Vannelli A, Mercaldo V, et al. The autism-and schizophrenia-associated protein CYFIP1 regulates bilateral brain connectivity and behaviour. Nat Commun 2019; 10(1): 1-13.
[http://dx.doi.org/10.1038/s41467-019-11203-y] [PMID: 30602773]
[66]
Davenport EC, Szulc BR, Drew J, et al. Autism and schizophrenia-associated CYFIP1 regulates the balance of synaptic excitation and inhibition. Cell Rep 2019; 26(8): 2037-2051.e6.
[http://dx.doi.org/10.1016/j.celrep.2019.01.092] [PMID: 30784587]
[67]
Wu R, Li A, Sun B, et al. A novel m6A reader Prrc2a controls oligodendroglial specification and myelination. Cell Res 2019; 29(1): 23-41.
[http://dx.doi.org/10.1038/s41422-018-0113-8] [PMID: 30514900]
[68]
Zhang J, Chen M-J, Zhao G-X, et al. Common genetic variants in PRRC2A are associated with both neuromyelitis optica spectrum disor-der and multiple sclerosis in Han Chinese population. J Neurol 2021; 268(2): 506-15.
[http://dx.doi.org/10.1007/s00415-020-10184-z] [PMID: 32862241]
[69]
Guler R, Mpotje T, Ozturk M, et al. Batf2 differentially regulates tissue immunopathology in Type 1 and Type 2 diseases. Mucosal Immunol 2019; 12(2): 390-402.
[http://dx.doi.org/10.1038/s41385-018-0108-2] [PMID: 30542107]
[70]
Kayama H, Tani H, Kitada S, et al. BATF2 prevents T-cell-mediated intestinal inflammation through regulation of the IL-23/IL-17 path-way. Int Immunol 2019; 31(6): 371-83.
[http://dx.doi.org/10.1093/intimm/dxz014] [PMID: 30753547]
[71]
Xie JW, Huang XB, Chen QY, Ma YB, Zhao YJ, Liu LC, et al. m 6 A modification-mediated BATF2 acts as a tumor suppressor in gastric cancer through inhibition of ERK signaling. Mol Cancer 2020; 19(1): 1-15.
[http://dx.doi.org/10.1186/s12943-020-01223-4] [PMID: 31901224]
[72]
Shen Y, Peng C, Bai Q, et al. Epigenome-wide association study indicates hypomethylation of MTRNR2L8 in large-artery atherosclerosis stroke. Stroke 2019; 50(6): 1330-8.
[http://dx.doi.org/10.1161/STROKEAHA.118.023436] [PMID: 31084332]
[73]
Sharma L, Riva A. Intestinal barrier function in health and disease-any role of SARS-CoV-2? Microorganisms 2020; 8(11): 1744.
[http://dx.doi.org/10.3390/microorganisms8111744] [PMID: 33172188]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy