Risk-of-bias assessment — Glossary Aria Research

Extended definition

Risk-of-bias assessment is the structured examination of how much a study’s design and conduct may distort its results, moving the effect estimate away from the true value. It does not measure the quality of the reporting or the importance of the finding: it measures the study’s internal credibility. The assessment is done by domains, each covering a known source of bias, and produces a judgment per study, not an arbitrary global score. For randomized trials, the standard tool is RoB 2, described by Sterne and colleagues (2019), which assesses domains such as the randomization process, deviations from intended interventions, missing data, outcome measurement, and selection of the reported result. For non-randomized studies of interventions, Sterne and colleagues (2016) proposed ROBINS-I, which adds confounding and selection of participants as central domains, since these studies lack the protection of randomization. Both replaced the original Cochrane tool described by Higgins and colleagues (2011), and both guide the assessor through signaling questions before a per-domain judgment.

When it applies

Risk-of-bias assessment applies within a systematic review, study by study, as a mandatory step before synthesizing results. The choice of tool applies to the design: RoB 2 for randomized trials, ROBINS-I for observational studies of intervention. It applies per outcome when risk varies with what is measured, not only per study. It applies as a direct input to GRADE: risk of bias is one of the domains that downgrade the certainty of the body of evidence. It applies to the careful interpretation of a meta-analysis, where high-risk studies can be examined in a sensitivity analysis. And it applies to transparency, with the per-domain judgment making explicit why a study deserves more or less confidence.

When it does not apply

Risk-of-bias assessment does not apply as a measure of reporting completeness: a study can be well reported and still have high risk of bias, or well conducted but poorly reported. Confusing risk of bias with reporting guidelines is the most common error. It does not apply as an aggregated numeric score: summing domains into a single score, a practice the current tools abandoned, hides where the bias lies. The wrong tool does not apply to the wrong design; using RoB 2 on an observational study ignores confounding, the main risk of that design. It does not apply as a judgment of a study’s external quality or relevance, which are other dimensions. And it does not apply reliably without training and without duplication: the per-domain judgment is subjective enough to require two independent assessors.

Applications by field

Systematic reviews of interventions: RoB 2 for trials and ROBINS-I for non-randomized studies, as a standard step.
Clinical guidelines: input to GRADE, where risk of bias downgrades the certainty of evidence.
Epidemiology: assessment of confounding and selection in observational studies.
Metascience: aggregate use to map the credibility of an entire literature.

Common pitfalls

The first pitfall is confusing risk of bias with reporting quality, treating a well-written study as if it were low-risk. The second is reducing the assessment to a numeric score, losing the information of which domain failed. The third is applying the wrong tool to the design, ignoring confounding in observational studies. The fourth is assessing without duplication, leaving a subjective judgment in the hands of a single reviewer. The fifth is stopping at the assessment without using it: rating the risk and then synthesizing the studies as if all weighed equally wastes the analysis and contaminates the meta-analysis with fragile evidence.