The outbreak of data has empowered the growth of the business by adding
business values from the available digital information in recent days. Data is elicited
from a diverse source of information systems to bring out certain kinds of meaningful
inferences, which serve closer in promoting the business values. The approach used
in studying such vital data characteristics and analyzing the data thoroughly is the
Exploratory Data Analysis (EDA), which is the most critical and important phase of
data analysis. The main objective of the EDA process is to uncover the hidden facts
of massive data and discover the meaningful patterns of information which impact the
business value. At this vantage point, the EDA can be generalized into two methods,
namely graphical and non-graphical EDA’s. The graphical EDA is the quick and
powerful technique that visualizes the data summary in a graphical or pictorial
representation. The graphical visualization of the data displays the correlation and
distribution of data before even attempting the statistical techniques over it. On the
other hand, the non-graphical EDA presents the statistical evaluation of data while
pursuing its’ key characteristics and statistical summary. Based on the nature of
attributes, the above two methods are further divided as Univariate, Bivariate, and
Multivariate EDA processes. The univariate EDA shows the statistical summary of
an individual attribute in the raw dataset. Whereas, the bivariate EDA demonstrates
the correlation or interdependencies between actual and target attributes; the
multivariate EDA is performed to identify the interactions among more than two
attributes. Hence, the EDA techniques are used to clean, preprocess, and visualize the
data to draw the conclusions required to solve the business problems. Thus, in this
chapter, a comprehensive synopsis of different tools and techniques can be applied
with a suitable programming framework during the initial phase of the EDA process.
As an illustration, to make it easier and understandable, the aforementioned EDA
techniques are explained with appropriate theoretical concepts along with a suitable
case study.
Keywords: Bivariate analysis, Data visualization, Exploratory data analysis
(EDA), Multivariate analysis, Statistical methods, Univariate analysis.