Confirmatory factor analysis (CFA) — Glossary Aria Research

Extended definition

Confirmatory factor analysis (CFA) is a statistical technique that tests specific hypotheses about the factor structure underlying a set of observed variables. Unlike exploratory factor analysis, in which factors are extracted without prior restrictions, CFA requires that the researcher specify in advance which items load on which factors, the structure of correlations among factors, and which loadings are fixed at zero. The model is then estimated by maximum likelihood or an equivalent method, and fit to the empirical data is evaluated by indices such as CFI ( $\geq 0.95$ ), TLI ( $\geq 0.95$ ), RMSEA ( $\leq 0.06$ ), and SRMR ( $\leq 0.08$ ). The canonical formalization is Jöreskog (1969); the most widely used contemporary implementation in academic research is the lavaan package in R (Rosseel, 2012).

When it applies

CFA is appropriate when theory or prior research justifies a specific factor structure. Typical applications include validation of psychometric instruments, quality-of-life scales, educational assessment instruments, and questionnaires translated to a new culture or language — situations in which the structure is hypothesized from established literature, and the goal is to confirm or reject that structure in one’s own data.

When it does not apply

CFA does not apply when the factor structure is genuinely unknown — in those cases, EFA is the correct preliminary step, with CFA used in an independent sample. It does not replace content validation (face validity, expert panel), which is a prior qualitative stage. In small samples ( $n < 200$ ) with complex models, estimates can be unstable and fit indices misleading. Model modifications guided solely by modification indices without theoretical justification produce fit to sample noise, compromising replicability.

Applications by field

— Psychology and psychometrics: the natural territory of CFA; validation of any publishable scale passes through this stage. — Education: analysis of standardized tests, institutional assessment instruments. — Health: quality-of-life instruments (SF-36, WHOQOL), validated clinical scales. — Marketing and consumer behavior: validation of constructs such as satisfaction, loyalty, purchase intention.

Common pitfalls

The first pitfall is relying exclusively on a single fit index — good CFI with poor RMSEA signals a problem, and no single index is sufficient alone. The second is iteratively respecifying the model in a single sample to improve fit — a practice that artificially inflates apparent quality and does not generalize. The third is confusing CFA with rotated exploratory factor analysis; without restrictions genuinely fixed at zero, the model is just a disguised EFA. The fourth is ignoring multivariate normality assumptions — non-normal data requires robust estimators (MLR, WLSMV) that many researchers fail to apply. The fifth is insufficient sample size: a rule-of-thumb minimum of 10 observations per estimated parameter, with complex models requiring $n \geq 300$ for stable results.