Exploratory Matrix Factorization Techniques for Large Scale Biomedical Data Sets

Author(s): E.W. Lang, R. Schachtner, D. Lutter, D. Herold, A. Kodewitz, F. Blochl, F. J. Theis, I. R. Keck, J.M Gorriz Saezd, P. Gomez, P. Gomez Vildae and A. M. Tomec

Pp: 26-47 (22)

* (Excluding Mailing and Handling)

Abstract

Exploratory matrix factorization (EMF) techniques applied to two-way or multi-way biomedical data arrays provide new and efficient analysis tools which are currently explored to analyze large scale data sets like gene expression profiles (GEP) measured on microarrays, lipidomic or metabolomic profiles acquired by mass spectrometry (MS) and/or high performance liquid chromatography (HPLC) as well as biomedical images acquired with functional imaging techniques like functional magnetic resonance imaging (fMRI) or positron emission tomography (PET). Exploratory feature extraction techniques like, for example, Principal Component Analysis (PCA), Independent Component Analysis (ICA) or sparse Nonnegative Matrix Factorization (NMF) yield uncorrelated, statistically independent or sparsely encoded and strictly non-negative features which in case of GEPs are called eigenarrays (PCA), expression modes (ICA) or metagenes (NMF). They represent features which characterize the data sets under study and are generally considered indicative of underlying regulatory processes or functional networks and also serve as discriminative features for classification purposes. In the latter case, EMF techniques, when combined with diagnostic a priori knowledge, can directly be applied to the classification of biomedical data sets by grouping samples into different categories for diagnostic purposes or group genes, lipids, metabolic species or activity patches into functional categories for further investigation of related metabolic pathways and regulatory or functional networks. Although these techniques can be applied to large scale data sets in general, the following discussion will primarily focus on applications to microarray data sets and PET images.

Keywords: exploratory matrix factorization (EMF), feature extraction, classification, ICA, NMF, large scale biomedical data sets

Cite as