DATA & STATISTICS

Structural equation modeling (SEM)

Family of multivariate techniques combining factor analysis and multiple regression to test networks of relationships between latent and observed variables. Standard in social, behavioral, and health sciences for validating complex theoretical models.

Extended definition

Structural equation modeling (SEM) is a family of multivariate techniques that integrates a measurement model — typically a confirmatory factor analysis — with a structural model of relationships among latent variables. The canonical formalization is Bollen (1989); the technique allows simultaneous testing of measurement hypotheses (is each construct well measured by its items?) and structural hypotheses (do the constructs relate as the theoretical model proposes?). A full SEM model is estimated by maximum likelihood or robust variants, with fit assessed by CFI, TLI, RMSEA, and SRMR — the same indices as CFA, plus a chi-square test and measures of variance explained (R2R^2) per equation. Contemporary implementations in academic research include lavaan in R (Rosseel, 2012), Mplus, AMOS, and SmartPLS for partial-least-squares variants (PLS-SEM).

When it applies

SEM is appropriate when a mature theoretical model involves latent constructs and hypothesized relationships among them, with sample size sufficient for stable parameter estimation. Typical applications include testing theoretical models in organizational psychology, consumer behavior, education research, behavioral health models, and relationships among socioeconomic factors. SEM is especially useful when simple regression would be inadequate because of measurement error in predictor variables, or when the model posits mediation effects that must be tested jointly.

When it does not apply

SEM does not apply when the theoretical model is poorly developed — it is a confirmatory tool, not an exploratory one. It does not apply in small samples (n<200n < 200 for simple models; n<400n < 400 for complex ones), where estimates become unstable. It does not substitute for experimental research in causal inference — SEM tests the consistency of a hypothesized causal model with correlational data, but does not establish causality. Models with many variables and few observations violate assumptions and produce non-replicable results. PLS-SEM, often used as a shortcut for small samples, has distinct statistical properties and is not always equivalent to covariance-based SEM.

Applications by field

Psychology and organizational behavior: the primary territory; tests of motivational, satisfaction, engagement, and leadership models. — Marketing and consumer research: models of perceived quality, satisfaction, loyalty, with PLS-SEM popular in management literature. — Public health and social epidemiology: models of social determinants of health with mediation by lifestyle and service access. — Education: models of factors influencing academic performance, self-efficacy, engagement.

Common pitfalls

The first pitfall is treating SEM as automatic confirmation of theory — good fit is necessary but not sufficient; several alternative models can produce comparable fit to the same data. The second is ignoring the measurement model — without prior validated CFA, the structural model is built on poorly measured constructs. The third is iterative respecification guided by modification indices, which produces sample fit but does not generalize. The fourth is confusing covariance-based SEM with PLS-SEM, which have distinct assumptions, estimators, and interpretations. The fifth is undersampling: recent literature recommends nn proportional to model complexity, with an absolute minimum of 10 observations per estimated parameter, ideally 20. The sixth is causal interpretation of coefficients in cross-sectional observational data — SEM does not solve the causality problem; it only tests consistency.

Last updated —