DATA & STATISTICS

Cronbach's alpha

Classical coefficient of internal consistency for scales and instruments, proposed by Cronbach in 1951. Despite massive use in psychometrics, today widely criticized for restrictive assumptions — alternatives such as McDonald's omega are preferred.

Extended definition

Cronbach’s alpha is the classical coefficient of internal consistency for scales and questionnaires, proposed by Lee Cronbach in 1951 (Psychometrika). The canonical formulation for a scale with kk items is:

α=kk1(1i=1kσi2σt2)\alpha = \frac{k}{k-1}\left(1 - \frac{\sum_{i=1}^{k} \sigma_i^2}{\sigma_t^2}\right)

where σi2\sigma_i^2 is the variance of item ii and σt2\sigma_t^2 the variance of the total score. The coefficient is interpreted as a measure of how much the scale’s items measure the same underlying construct, with values between 0 and 1 — conventionally, α0.7\alpha \geq 0.7 is considered acceptable for research, 0.8\geq 0.8 good, 0.9\geq 0.9 excellent. Despite massive adoption in psychometrics for more than half a century, alpha rests on restrictive assumptions — in particular, tau-equivalence (all items measure the construct with the same loading) and unidimensionality — often violated in real scales. Sijtsma (2009) and subsequent literature document technically superior alternatives: McDonald’s omega, coefficient H, hierarchical omega.

When it applies

Alpha is appropriate as a quick check and historical standard in validating scales with reflective items (which measure the same underlying construct). It continues to be required as standard reporting by many journals due to editorial inertia, and serves as a comparison point with the vast prior literature. In contexts where readers and reviewers expect alpha, computing and reporting it is reasonable conservative practice — but the contemporary technical recommendation is to report it alongside omega or another alternative, not as the sole metric.

When it does not apply

Alpha does not apply in multidimensional scales (multiple underlying factors) — each subscale should have its own alpha, and a global alpha is practically meaningless. It does not apply in scales with formative items (which constitute the construct rather than reflecting it) — these require validation by other methods. It does not replace confirmatory factor analysis in serious instrument validation; alpha says items correlate, CFA tests whether a proposed structure fits. In scales with few items (3-4), alpha becomes unstable and can be low even in valid scales. In items with very restricted scale (binary, or 3-point Likert), alpha systematically underestimates reliability.

Applications by field

Psychometrics and psychology: natural territory; standard reporting in scale validation, despite technical criticism. — Health: quality-of-life, pain, mental health instruments; reporting required in cultural validation studies. — Education: reliability of standardized tests, institutional assessment scales. — Marketing and consumer behavior: satisfaction, loyalty, attitude scales, with PLS-SEM often replacing alpha with composite reliability.

Common pitfalls

The first pitfall is treating α0.7\alpha \geq 0.7 as automatic certificate of quality — high alpha can result from many redundant items rather than good measurement. The second is computing alpha on a multidimensional scale as if it were unidimensional — produces a number without interpretive meaning. The third is confusing alpha with validity — alpha measures internal consistency, not whether the scale measures what it claims. A scale can have alpha 0.95 and measure something different from what is declared. The fourth is ignoring Sijtsma’s technical critique and subsequent literature — alpha is not the superior “reliability” coefficient; it is a restricted special case of the family of reliability coefficients. The fifth is relying on alpha in small samples — the coefficient’s confidence interval is wide at n<200n < 200, and the point estimate can be misleading.

Last updated —