Title:Integrated Bioinformatics and Machine Learning Algorithms Analyses
Highlight Related Pathways and Genes Associated with Alzheimer's
Disease
Volume: 17
Issue: 3
Author(s): Hui Zhang , Qidong Liu, Xiaoru Sun, Yaru Xu, Yiling Fang, Silu Cao, Bing Niu*Cheng Li*
Affiliation:
- School of Life Sciences, Shanghai
University, Shanghai 200444, P.R. China
- Department of Anesthesiology and Perioperative Medicine, Shanghai Fourth People’s Hospital,
School of Medicine, Tongji University, Shanghai 200434, China
- Translational Research Institute of Brain and Brain-
Like Intelligence, Shanghai Fourth People's Hospital, School of Medicine, Tongji University, Shanghai 200434, China
- Clinical Research Center for Anesthesiology and Perioperative Medicine, Tongji University, Shanghai 200434, China
Keywords:
Alzheimer’s disease, entorhinal cortex, machine learning, bioinformatics, expressed key genes, G protein-coupled receptor.
Abstract:
Background: The pathophysiology of Alzheimer's Disease (AD) is still not fully studied.
Objective: This study aimed to explore the differently expressed key genes in AD and build a predictive
model of diagnosis and treatment.
Methods: Gene expression data of the entorhinal cortex of AD, asymptomatic AD, and control samples
from the GEO database were analyzed to explore the relevant pathways and key genes in the progression
of AD. Differentially expressed genes between AD and the other two groups in the module were
selected to identify biological mechanisms in AD through KEGG and PPI network analysis in
Metascape. Furthermore, genes with a high connectivity degree by PPI network analysis were selected
to build a predictive model using different machine learning algorithms. Besides, model performance
was tested with five-fold cross-validation to select the best fitting model.
Results: A total of 20 co-expression gene clusters were identified after the network was constructed.
Module 1 (in black) and module 2 (in royal blue) were most positively and negatively correlated with
AD, respectively. Total 565 genes in module 1 and 215 genes in module 2, respectively, overlapped in
two differentially expressed genes lists. They were enriched in the G protein-coupled receptor signaling
pathway, immune-related processes, and so on. 11 genes were screened by using lasso logistic regression,
and they were considered to play an important role in predicting AD samples. The model built by
the support vector machine algorithm with 11 genes showed the best performance.
Conclusion: This result shed light on the diagnosis and treatment of AD.