Checking Normality: Tests vs Plots Done Right
"Is my data normal?" is one of the most misunderstood questions in applied statistics. First, it's usually the wrong question: most tests assume the residuals are approximately normal, not the raw variable.
Second, the tools have opposite failure modes. A significance test like Shapiro–Wilk becomes hypersensitive at large n (flagging trivial departures) and underpowered at small n (missing real ones). Visual tools like the Q-Q plot are often more informative.
Methods compared
| Method | Strength | Weakness |
|---|---|---|
| Shapiro–Wilk | Objective, powerful at moderate n | Over-sensitive at large n |
| Q-Q plot | Shows where/how it deviates | Subjective |
| Histogram | Intuitive shape | Bin-dependent |
What actually needs to be normal
For t-tests and regression it's the residuals; and thanks to the Central Limit Theorem, inference about means is robust at larger sample sizes.
The large-n trap
With thousands of rows, Shapiro–Wilk will almost always be "significant" — read the Q-Q plot and effect of the deviation instead.
If normality fails
Consider Welch's test, a transformation, or a rank-based alternative such as Kruskal–Wallis.
Run this analysis on your own data — free, in your browser.
Open MindStat →