DATA & STATISTICS

Analysis of variance (ANOVA)

Analysis of Variance: classical statistical technique for comparing means across three or more groups. Established by Fisher in 1925, it forms the foundation of experimental design in biomedical, agricultural, and behavioral sciences.

Extended definition

ANOVA (Analysis of Variance) is the classical statistical technique for testing whether the means of three or more groups differ more than would be expected from sampling fluctuation. It was formalized by Ronald Fisher in Statistical Methods for Research Workers (1925) as the central tool of agricultural experimental design, and adapted in subsequent decades for any empirical science that compares groups. The test statistic is the ratio between between-group variance and within-group variance:

F=MSbetweenMSwithinF = \frac{MS_{\text{between}}}{MS_{\text{within}}}

where MSbetweenMS_{\text{between}} is the between-group sum of squares divided by between-group degrees of freedom, and MSwithinMS_{\text{within}} the within-group sum of squares divided by within-group degrees of freedom. Under the null hypothesis of equal means, FF follows Fisher-Snedecor’s FF distribution. Variants include one-way ANOVA (one categorical variable), two-way or higher (with possible interactions), repeated measures (same subjects across conditions), MANOVA (multiple dependent variables), and ANCOVA (with continuous control covariates).

When it applies

ANOVA is appropriate when the design compares three or more groups defined by a categorical variable, with an approximately normal continuous dependent variable and comparable group variances (homoscedasticity). It is standard in classical experimental designs: clinical trials with multiple treatment arms, agricultural experiments with different fertilizers, psychological studies comparing conditions. For only two groups, the tt-test is equivalent and more direct. For categorical dependent variables, chi-square or logistic regression are alternatives.

When it does not apply

ANOVA does not apply when assumptions are severely violated: strongly skewed dependent variable, very unequal variances among groups (heteroscedasticity), or non-independent observations (clusters, repeated measures without appropriate adjustment). In severe violation, robust alternatives (Welch’s ANOVA for unequal variances, non-parametric Kruskal-Wallis, mixed models for nested data) are preferable. For testing a single specific hypothesis between two groups, ANOVA is redundant. In designs with structural confounding (uncontrolled observational covariates), ANOVA does not replace full regression analysis.

Applications by field

Health and biomedical sciences: clinical trials with 3+ arms, clinical subgroup comparison, efficacy analysis in pharmacology. — Agricultural sciences: Fisher’s original setting; comparison of treatments, blocks, environmental factors. — Experimental psychology: comparison among experimental conditions, factorial ANOVA for variable interactions. — Education: comparison of teaching methods, pedagogical interventions, schools, or regions.

Common pitfalls

The first pitfall is stopping at ANOVA without post-hoc tests — significant ANOVA only indicates that at least one mean differs, without identifying which. Tukey HSD, Bonferroni, or Scheffé are standard corrections for multiple comparisons. The second is violating assumptions without a robust alternative: severely non-normal or heteroscedastic data with small nn produce unreliable pp-values. The third is ignoring effect size in reporting — partial eta-squared or omega-squared should accompany every ANOVA pp-value, per APA recommendation. The fourth is confusing interaction significance in factorial ANOVA with main effect — significant interaction modifies the interpretation of main effects, and can mask them. The fifth is using repeated-measures ANOVA ignoring the sphericity assumption (Mauchly) — violation requires Greenhouse-Geisser or Huynh-Feldt correction.

Last updated —