Title:DSAE-Impute: Learning Discriminative Stacked Autoencoders for Imputing
Single-cell RNA-seq Data
Volume: 17
Issue: 5
Author(s): Shengfeng Gan, Huan Deng, Yang Qiu, Mohammed Alshahrani and Shichao Liu*
Affiliation:
- College of Informatics, Huazhong Agricultural University, Wuhan, China
Keywords:
scRNA-seq, imputation, gene expression, dropout event, a discriminative stacked autoencoder, DSAE-Impute.
Abstract:
Background: Due to the limited amount of mRNA in single-cell, there are always many
missing values in scRNA-seq data, making it impossible to accurately quantify the expression of singlecell
RNA. The dropout phenomenon makes it impossible to detect the truly expressed genes in some
cells, which greatly affects the downstream analysis of scRNA-seq data, such as cell cluster analysis
and cell development trajectories.
Objective: This research proposes an accurate deep learning method to impute the missing values in
scRNA-seq data. DSAE-Impute employs stacked autoencoders to capture gene expression characteristics
in the original missing data and combines the discriminative correlation matrix between cells to
capture global expression features during the training process to accurately predict missing values.
Methods: We propose a novel deep learning model based on the discriminative stacked autoencoders to
impute the missing values in scRNA-seq data, named DSAE-Impute. DSAE-Impute embeds the discriminative
cell similarity to perfect the feature representation of stacked autoencoders and comprehensively
learns the scRNA-seq data expression pattern through layer-by-layer training to achieve accurate
imputation.
Results: We have systematically evaluated the performance of DSAE-Impute in the simulation and real
datasets. The experimental results demonstrate that DSAE-Impute significantly improves downstream
analysis, and its imputation results are more accurate than other state-of-the-art imputation methods.
Conclusion: Extensive experiments show that compared with other state-of-the-art methods, the imputation
results of DSAE-Impute on simulated and real datasets are more accurate and helpful for downstream
analysis.