NESG

Icono Icono

Icono Icono

Visualizing Big data with Compressed Score Plots: Approach and Research Challenges.

José Camacho
Abstract:
Exploratory Data Analysis (EDA) can be defined as the initial exploration of a data set with the aim of generating a hypothesis of interest. Projection models based on latent structures and associated visualization techniques are valuable tools within EDA. In particular, score plots are a main tool to discover patterns in the observations. This paper addresses the extension of score plots to very large data sets, with an unlimited number of observations. The proposed solution, based on clustering and approximation techniques, is referred to as the Compressed Score Plots (CSPs). The approach is presented to deal with high volume data sets and high velocity data streams. The objective is to retain the visualization capabilities of traditional score plots while making the user-supervised analysis of huge data sets affordable in a similar time scale to that of low size data sets. Efficient processing and updating approaches, visualization techniques, performance measures and challenges for future research are identified throughout the paper. The approach is illustrated with several data sets, including a data set of five million observations and more than one hundred variables.
Research areas:
Year:
2014
Type of Publication:
Article
Keywords:
Big Data, Multivariate Analysis
Journal:
Chemometrics and Intelligent Laboratory Systems
Volume:
135
Pages:
110-125
Hits: 2364