Statistics for Data Analysts

πŸ“Š Statistics for Data Analysts

You don't need a maths degree. But you do need to understand statistics well enough to not mislead yourself or your stakeholders. This is the minimum viable statistics knowledge for a working data analyst.


The Core Concepts (No Calculus Required)

1. Descriptive Statistics

Summarising what the data is. Used every day.

Concept What it means When to use it
Mean Average (sum Γ· count) Symmetric data without outliers
Median Middle value when sorted Salaries, prices, incomes β€” any skewed data
Mode Most frequent value Categorical data, most popular item
Standard Deviation Average spread from the mean How consistent or variable is the data?
Percentiles What % of values fall below X? P25, P50, P75, P95 for distributions

2. Distributions

  • Normal distribution β€” Bell curve. Heights, measurement errors. Mean = Median = Mode.
  • Skewed distribution β€” Salaries, revenue, wait times. Mean is pulled toward the tail. Always report median here.
  • Uniform distribution β€” All values equally likely. Rolling a fair die.
  • The key question: Is my data symmetric? If no, use median not mean.

3. Correlation vs Causation

This is the concept that most trips up analysts in interviews and in real work.

  • Correlation: Two variables move together. Ice cream sales and drowning rates both rise in summer. They are correlated but neither causes the other.
  • Causation: One variable directly causes a change in another. You need a controlled experiment (A/B test) to establish causation. Correlation alone never proves it.
  • Confounding variable: A third variable driving both. In the example above: summer temperature causes both ice cream sales AND more swimming β†’ more drownings.

4. Hypothesis Testing & p-values

Used in A/B tests, feature launches, and any experiment.

  • Null hypothesis (Hβ‚€): Assume nothing changed / no effect.
  • p-value: The probability of seeing results at least as extreme as yours, assuming Hβ‚€ is true.
  • p < 0.05: Conventionally "statistically significant" β€” there's less than 5% chance this result happened by random chance.
  • Important: Statistical significance β‰  practical importance. A tiny, meaningless difference can be statistically significant with a large enough sample.

5. Confidence Intervals

A 95% confidence interval means: if we ran this experiment 100 times, about 95 of those intervals would contain the true population value. It is not a statement about the probability that any particular interval contains the true value.


Statistics Interview Questions (with Answers)

β€œWhat is the difference between mean and median? When would you use each?”

Model answer: Mean is the arithmetic average β€” divide the sum by the count. It's sensitive to outliers. Median is the middle value when data is sorted β€” it's robust to outliers. For salary data, revenue, or house prices β€” anything right-skewed β€” always report the median. For exam scores or temperature, which are roughly symmetric, mean is appropriate.

β€œWhat does a p-value of 0.03 mean?”

Model answer: It means there's a 3% probability of observing results at least this extreme if the null hypothesis (no effect) were true. Conventionally, p < 0.05 is considered statistically significant β€” meaning we have enough evidence to reject the null hypothesis. However, statistical significance doesn't mean the effect is large or practically important.

β€œYou run an A/B test. What do you check before concluding your new feature works?”

Model answer: (1) Was the test run long enough to reach statistical significance? (2) Were the two groups randomised properly? (3) Is the sample size large enough? (4) Did any external events occur during the test that could explain the difference (seasonality, marketing campaigns)? (5) Is the effect size practically meaningful, not just statistically significant? (6) Did I set the significance threshold before running the test, not after seeing the results?


Free Statistics Resources


Go Deeper with Our Interview Guide

Our Data Analyst Interview Q&A guide has a full section on statistics questions with detailed model answers β€” written the way you'd actually answer in an interview, not a textbook.

Get the Interview Guide β†’