Multilevel modeling: when MLM is required and when OLS suffices

There is an automatic answer that circulates in graduate programs: if data have nested structure — students within schools, patients within hospitals, repeated measures within individuals — the appropriate analysis is multilevel modeling. The answer is correct in direction but insufficient in precision. Not every nested dataset requires MLM, and the argument that it always does fails under competent peer review.

The question that defines the choice is not whether structure exists — it almost always does in social, educational, organizational, or clinical research — but whether the structure has enough impact on the parameters of interest to justify the additional complexity of MLM. The operational heuristic, derived from simulations documented in Hox, Moerbeek, and van de Schoot (2017) and revisited in Sommet and Morselli (2017) and McNeish and Wentzel (2017), is the intraclass correlation coefficient, known as ICC.

What ICC measures

ICC, in its basic form for linear mixed models, is the proportion of total variance attributable to between-cluster variance. Formally, ICC = τ₀²/(τ₀² + σ²), where τ₀² is between-cluster variance and σ² is within-cluster variance. ICC = 0 indicates clusters indistinguishable on the response variable; ICC = 1 indicates all variance is between clusters and none is within.

The practical interpretation is direct. ICC measures how correlated observations within the same cluster are. In a classroom, for example, the ICC of mathematics performance measures how similar students in the same class are in performance — through the teacher, the curriculum, the class culture. Low ICC indicates the class matters little; high ICC indicates the class matters substantially.

The cost of ignoring high ICC

Ignoring nested structure when ICC is non-trivial is not a stylistic decision — it produces documentable inflation of Type I error rate. Classical simulations show that, for a test with nominal α = 0.05 conducted via OLS on nested data with typical cluster size (n ≈ 20), the observed Type I error rate grows non-linearly with ICC. At ICC = 0.05, the rate rises to approximately 11%. At ICC = 0.10, to 18%. At ICC = 0.20, to 33%.

The operational consequence is severe. A manuscript reporting “p < 0.05” on nested data with ICC = 0.20 has a real Type I error probability of around 33%, not 5%. The methodological reviewer familiar with this pattern requests reanalysis via MLM or via OLS with cluster-corrected standard errors, and the author who replied in the first round that “OLS is robust” loses the round.

Horizontal bar chart showing Type I error inflation in OLS applied to nested data with increasing ICC — Observed Type I error rate in OLS conducted on nested data, compared with the nominal level of α = 0.05. From ICC ≈ 0.05 onward, inflation is non-trivial; at ICC ≈ 0.20, the observed rate exceeds 30%, more than six times nominal. Approximate values consistent with simulations in Hox, Moerbeek, and van de Schoot (2017) and revisited in McNeish and Wentzel (2017). The highlighted category — ICC = 0.05 — marks the practical threshold below which robust OLS may suffice and above which correction is required.

The heuristic that sustains the decision

The operational rule that works in peer review has three bands. ICC below 0.05 indicates that nested structure is negligible for inferential purposes. OLS with robust standard errors may suffice, but the decision needs to be justified with the reported ICC. ICC between 0.05 and 0.20 indicates non-trivial structure. The choice is between MLM and OLS with cluster-robust standard errors (Cameron and Miller 2015). Both are defensible; the choice depends on whether the analytical focus includes between-cluster variance as an object of interest. ICC above 0.20 indicates that nested structure is central. MLM is the required choice.

The nuance competent reviewers add is that the ICC rule is not the only criterion. Number of clusters matters: MLM with fewer than 20 clusters produces unstable variance-component estimates. Homogeneous cluster size enables simplifications; heavily unbalanced cluster size complicates estimation. Analytical intent matters: studying between-cluster variation as a construct of interest requires MLM even at low ICC.

What to report in a manuscript

The methods section of a manuscript with nested data should report, in the order reviewers check: the nested structure made explicit — how many levels, how many clusters at each level, distribution of cluster sizes; ICC calculated from the null (unconditional) model; justification of the analytical choice based on ICC and complementary criteria; the models actually fitted, with named fixed and random effects; and the fit criteria used (deviance, AIC, BIC) in model comparison.

The cost of doing this sequence correctly is low. The cost of not doing it is an extra revision round, or a rejection motivated by methodological inadequacy.

Multilevel modeling: when MLM is required and when OLS suffices

What ICC measures

The cost of ignoring high ICC

The heuristic that sustains the decision

What to report in a manuscript

References

This analysis reflects Aria's practice in Statistical Analysis and Structural Equation Modeling.

What ICC measures

The cost of ignoring high ICC

The heuristic that sustains the decision

What to report in a manuscript

References

This analysis reflects Aria's practice in Statistical Analysis and Structural Equation Modeling.

Bibliometric analysis as empirical thesis argument

Measurement invariance in translated instruments

A p-value alone won't cut it: what Q1 reviewers read in your results section