GPCA for improved multivariate analysis interpretation in lipidomics

Referencia completa:

S. Tortorella, J. Camacho, G. Cruciani. "GPCA for improved multivariate analysis interpretation in lipidomics".  International Workshop Enviromental  OMICS Integration & Modelling. 2017

Ver presentación

Abstract:

Individual lipid species, lipid families, or specific lipid changes from sample to sample can be easily
revealed using multivariate statistical procedures. To this end, both unsupervised (e.g., principal component
analysis: PCA) and supervised (e.g., partial least-squares: PLS) algorithms can be used [2]. However, the
tendency of lipidomic tools is to make multivariate statistical analysis as simple as possible for the user,
leading in many cases to “black boxes” in which advanced data interpretation is very limited. Furthermore,
interpretation of standard MA tools, like PCA loading plots, may be challenging due to the dimensionality
of the data, since the principal components are linear combinations of all the variables simultaneously.
To overcome these limitations, here we demonstrate that Group-wise Principal Component Analysis
(GPCA) [3], a recently proposed extension of PCA for exploratory analysis, can be successfully applied
as a user-friendly advanced tool for the visualization and interpretation of the statistical analysis. GPCA
starts from the groups of variables identified by MEDA (Missing-data for Exploratory Data Analysis) [4] and
performs a constrained PCA-like calibration where loadings are restricted to present non-zero values only
for a group of variables. In this way, the obtained Group-wise PCs (GPCs) are sparse factorizations that
can be inspected individually (one GPC at a time), simplifying interpretation. To further make MA outcomes
interpretation easier and faster, we also introduce here the Discriminative score (D-score), which assesses
the discriminative power of each GPC according to an arbitrary desired clusterization, in turn allowing the
ranking of the GPCs.
Examples of how and why GPCA combined with the D-score are valuable solutions for the exploration and
interpretation of complex real lipidomic case studies will be given.

Acknowledgement
This work is partly supported by the Spanish Ministry of Economy and Competitiveness and FEDER funds
through project TIN2014-60346-R.
References
1. Ekroos, K. (2012). Lipidomics; Wiley-VCH Verlag GmbH & Co. KGaA: Weinheim, Germany, 2012
2. Goracci, L., Tortorella, S., Tiberi, P., et al. (2017). Lipostar, a Comprehensive Platform-Neutral Cheminformatics Tool for Lipidomics.
Anal Chem. 2017;89(11):6257-6264.
3. Camacho, J., Rodríguez-Gómez, R., Saccenti, E. (2017). Group-wise Principal Component Analysis for Exploratory Data Analysis.
Accepted in Journal of Computational and Graphical Statistics.
4. Camacho, J. (2010). Missing-data theory in the context of exploratory data analysis. Chemometrics and Intelligent Laboratory
Systems, 103:8–18.

[Pulse aquí para ver el artículo completo]