Statistics for Data Analysts
π Statistics for Data Analysts
You don't need a maths degree. But you do need to understand statistics well enough to not mislead yourself or your stakeholders. This is the minimum viable statistics knowledge for a working data analyst.
The Core Concepts (No Calculus Required)
1. Descriptive Statistics
Summarising what the data is. Used every day.
| Concept | What it means | When to use it |
|---|---|---|
| Mean | Average (sum Γ· count) | Symmetric data without outliers |
| Median | Middle value when sorted | Salaries, prices, incomes β any skewed data |
| Mode | Most frequent value | Categorical data, most popular item |
| Standard Deviation | Average spread from the mean | How consistent or variable is the data? |
| Percentiles | What % of values fall below X? | P25, P50, P75, P95 for distributions |
2. Distributions
- Normal distribution β Bell curve. Heights, measurement errors. Mean = Median = Mode.
- Skewed distribution β Salaries, revenue, wait times. Mean is pulled toward the tail. Always report median here.
- Uniform distribution β All values equally likely. Rolling a fair die.
- The key question: Is my data symmetric? If no, use median not mean.
3. Correlation vs Causation
This is the concept that most trips up analysts in interviews and in real work.
- Correlation: Two variables move together. Ice cream sales and drowning rates both rise in summer. They are correlated but neither causes the other.
- Causation: One variable directly causes a change in another. You need a controlled experiment (A/B test) to establish causation. Correlation alone never proves it.
- Confounding variable: A third variable driving both. In the example above: summer temperature causes both ice cream sales AND more swimming β more drownings.
4. Hypothesis Testing & p-values
Used in A/B tests, feature launches, and any experiment.
- Null hypothesis (Hβ): Assume nothing changed / no effect.
- p-value: The probability of seeing results at least as extreme as yours, assuming Hβ is true.
- p < 0.05: Conventionally "statistically significant" β there's less than 5% chance this result happened by random chance.
- Important: Statistical significance β practical importance. A tiny, meaningless difference can be statistically significant with a large enough sample.
5. Confidence Intervals
A 95% confidence interval means: if we ran this experiment 100 times, about 95 of those intervals would contain the true population value. It is not a statement about the probability that any particular interval contains the true value.
Statistics Interview Questions (with Answers)
βWhat is the difference between mean and median? When would you use each?β
Model answer: Mean is the arithmetic average β divide the sum by the count. It's sensitive to outliers. Median is the middle value when data is sorted β it's robust to outliers. For salary data, revenue, or house prices β anything right-skewed β always report the median. For exam scores or temperature, which are roughly symmetric, mean is appropriate.
βWhat does a p-value of 0.03 mean?β
Model answer: It means there's a 3% probability of observing results at least this extreme if the null hypothesis (no effect) were true. Conventionally, p < 0.05 is considered statistically significant β meaning we have enough evidence to reject the null hypothesis. However, statistical significance doesn't mean the effect is large or practically important.
βYou run an A/B test. What do you check before concluding your new feature works?β
Model answer: (1) Was the test run long enough to reach statistical significance? (2) Were the two groups randomised properly? (3) Is the sample size large enough? (4) Did any external events occur during the test that could explain the difference (seasonality, marketing campaigns)? (5) Is the effect size practically meaningful, not just statistically significant? (6) Did I set the significance threshold before running the test, not after seeing the results?
Free Statistics Resources
- πΊ StatQuest with Josh Starmer (YouTube) β the best statistics channel on YouTube. No maths background needed.
- π Khan Academy β Statistics β free, structured, start from basics
- π Seeing Theory β visual, interactive statistics concepts. Outstanding for building intuition.
Go Deeper with Our Interview Guide
Our Data Analyst Interview Q&A guide has a full section on statistics questions with detailed model answers β written the way you'd actually answer in an interview, not a textbook.