The correct identification of differentially expressed genes is a key concept
of many areas of genetic studies. Since 1990s, many different approaches, methods,
algorithms and statistics tools have been developed to analyze gene expression levels
of thousands of genes.
However, due to the growing complexity of managing, processing and interpreting
sequencing data in order to obtain reliable results, there is no consensus about the most
appropriate protocols and tools for the identification of differentially expressed genes,
starting from RNA-Seq data.
Thus, we propose an integrated and comprehensive approach that combines the most
used algorithms for DEG analysis, starting from the raw count data table. The proposed
method consists of three main steps: 1) preliminary data analysis and visualization; 2)
differential gene expression analysis, using Bioconductor packages (DESeq2, edgeR,
Limma, SAMSeq, TweeDESeq) and standard ANOVA (ez and afex packages); 3)
integration of results, using two main graphical outputs, through SuperExactTest,
UpSetR plots and ComplexHeatmaps packages.
In this way, a more robust output could be obtained in a simple manner, and with no
previous bioinformatic knowledge.
Keywords: Clustering comparison, Combination-based procedure, Concordance
analysis, Differential Expression Analysis, Intersections, Integration of results,
Normalization, Overlap proportion, Parametric and non- parametric, Performance
evaluation, p-values and Π score, RNA-Seq Data, ROC, Sensitivity, Simulations,
Specificity, SuperExactTest, Tools comparison, UpSetR and ComplexHeatmaps,
Validity of DEG tools.