Alex Rivera — Data Analyst & Scientist

Exploratory Data Analysis is less a checklist and more a conversation with your data. Here's how I structure that conversation.

The First 20 Minutes

When a new dataset lands in my inbox, I resist the urge to jump straight to the question at hand. Instead I spend the first 20 minutes on ruthless auditing:

import pandas as pd
import missingno as msno

df = pd.read_csv("data.csv")

print(df.shape)
print(df.dtypes)
print(df.isnull().mean().sort_values(ascending=False).head(20))
msno.matrix(df)

The missingno matrix instantly reveals whether missing values are random or systematic — a critical distinction before any imputation decision.

Distribution First, Relationships Second

Never jump to correlations before understanding marginals. A bimodal distribution often signals an unmeasured confound (two customer segments, two product lines, two time periods) that, if ignored, will poison any downstream model.

Communicating EDA Findings

The goal of EDA isn't analysis for its own sake — it's calibrating your model strategy and de-risking your assumptions. A one-page summary with 3 key findings and 2 data quality flags is more valuable than a 40-slide notebook dump.