Exploratory matrix factorization (EMF) techniques applied to two-way or multi-way biomedical
data arrays provide new and efficient analysis tools which are currently explored to analyze
large scale data sets like gene expression profiles (GEP) measured on microarrays, lipidomic
or metabolomic profiles acquired by mass spectrometry (MS) and/or high performance liquid
chromatography (HPLC) as well as biomedical images acquired with functional imaging techniques
like functional magnetic resonance imaging (fMRI) or positron emission tomography
(PET). Exploratory feature extraction techniques like, for example, Principal Component Analysis
(PCA), Independent Component Analysis (ICA) or sparse Nonnegative Matrix Factorization
(NMF) yield uncorrelated, statistically independent or sparsely encoded and strictly non-negative
features which in case of GEPs are called eigenarrays (PCA), expression modes (ICA) or metagenes
(NMF). They represent features which characterize the data sets under study and are generally
considered indicative of underlying regulatory processes or functional networks and also
serve as discriminative features for classification purposes. In the latter case, EMF techniques,
when combined with diagnostic a priori knowledge, can directly be applied to the classification
of biomedical data sets by grouping samples into different categories for diagnostic purposes or
group genes, lipids, metabolic species or activity patches into functional categories for further investigation
of related metabolic pathways and regulatory or functional networks. Although these
techniques can be applied to large scale data sets in general, the following discussion will primarily
focus on applications to microarray data sets and PET images.
Keywords: exploratory matrix factorization (EMF), feature extraction, classification, ICA, NMF, large scale
biomedical data sets