Handling missing data is not a cleanup step you settle before the analysis begins. It is a methodological decision that shifts the estimates, the standard errors, and with them the study’s conclusion. Reviewers at Q1 journals know this and read the missing-data section closely: they want to know which mechanism the author assumed, why, and whether the handling method is consistent with that assumption. When the justification is absent, what stands exposed is not an operational detail, it is the validity of the result.
The vocabulary that organizes this decision comes from Rubin (1976)6, who formalized the conditions under which the process generating the missingness can be ignored for inference. On that foundation, Schafer and Graham (2002)3 consolidated the distinction between data missing completely at random, missing at random, and missing not at random, and showed why case deletion and single-value imputation distort both the coefficients and their precision. The uncomfortable core of this literature, for anyone who treats the subject as routine, is that the missingness mechanism is an assumption the analyst has to defend, not a property that can be read off the data. No test decides on its own whether data are MAR or MNAR. The defense is argumentative, anchored in the study design and in what is known about why the values went missing.
The choice of method follows from that assumption, and this is where the reflex fails. Multiple imputation has become shorthand for good practice, yet Hughes and colleagues (2019)2 show, using missingness directed acyclic graphs, that complete-case analysis is unbiased in more situations than is usually supposed, including some MNAR structures, while multiple imputation assuming MAR can be biased in those same situations. The rule is neither always impute nor always delete. It is to make the choice follow the assumed mechanism. Pedersen and colleagues (2017)4 translate this logic into applied research, detailing when multiple imputation correctly propagates the uncertainty that single imputation hides. And van Ginkel and colleagues (2020)5 dismantle the objections that sustain much of the resistance to the method, the claim that imputation invents data and the claim that deletion is always safer.
One myth is more stubborn than these, the belief that the amount of missing data is what decides whether imputation is worthwhile. An author looks at 5% missingness and relaxes, looks at 50% and panics, as if the percentage were the risk parameter. Hao and colleagues (2025)1 tested that intuition directly. In a simulation on a total shoulder arthroplasty database, they inserted missingness under three mechanisms, MCAR, MAR and NMAR, at varying proportions, and measured imputation error against the complete data. What splits the results is not how much went missing, it is why it went missing.
The reading is direct and counterintuitive. Under MCAR and MAR the imputation error stays close, with RMSE of 22.6 and 19.2 and MAPE of 27.2% and 17.7%. Under NMAR it explodes: RMSE rises to 37.5 and MAPE to 79.2%, nearly three times the error under random missingness. The proportion of missing data, as the authors note, barely moves these numbers; what changes everything is the mechanism. The reason is structural: under NMAR the probability that a value is missing depends on the unobserved value itself, so no standard model, which learns from what remains, reconstructs what was systematically erased. What governs the result is not how much is missing, it is why it is missing.
For the author assembling a methods section, this becomes a sequence of verifiable obligations. The first is to state the assumed mechanism and justify it from the design, not assert it without argument. The second is to choose the handling method in keeping with that assumption: complete cases when they are sufficient and defensible, multiple imputation when auxiliary information reduces bias and recovers efficiency. The third is to specify the imputation model so that it includes the variables tied to the mechanism and to the outcome, because under MAR it is that auxiliary information that makes missingness ignorable, while under NMAR, as the figure shows, the error explodes and no standard imputation repairs it. The fourth is to report the fraction of missing information and to run sensitivity analyses when the MAR assumption is fragile.
None of these decisions is cosmetic, which is exactly why the reviewer looks for them. A study that deletes more than half its cases without saying why they went missing, or that imputes with a model that ignores the mechanism, does not have an execution problem; it has an inference problem that no later sophistication can repair. The operating rule is to treat missingness as part of the model, not as an obstacle to remove before modeling. The justified mechanism decides the method; the method consistent with the mechanism, and not the percentage of data that survived, decides whether the estimate means what the study claims it means.