Extended definition
P-hacking and HARKing are two practices that inflate the false-positive rate in the literature by exploiting the researcher’s flexibility in the analysis and the narrative of a study. P-hacking is the attempt, conscious or not, of various analyses, data exclusions, transformations, or data-collection stopping points until a result crosses the significance threshold. Simmons and colleagues (2011) named this flexibility researcher degrees of freedom and showed, through simulation and experiment, how it allows presenting almost any hypothesis as statistically significant. HARKing, short for hypothesizing after the results are known, is the practice described by Kerr (1998): formulating a hypothesis after seeing the results and presenting it as if it had been predicted before data collection. The two are distinct but reinforce each other: p-hacking fabricates the significant result, HARKing builds the story that justifies it. Head and colleagues (2015) used text-mining to show that p-hacking is widespread across fields, though its aggregate effect on meta-analytic conclusions is, on average, moderate.
When it applies
The concept applies as a critical lens when evaluating a study: it helps ask whether a significant result would survive pre-specified analyses or whether it was born from the hunt for a low p. It applies to interpreting literatures with many positive and few negative results, a sign that degrees of freedom may have been exploited. It applies to the design of robust studies: recognizing p-hacking and HARKing is what motivates preregistration, registered reports, and multiverse analysis, which fix the decisions before seeing the data or make them explicit. It applies to peer review, as a criterion for distinguishing a genuinely confirmatory hypothesis from a disguised exploratory one. And it applies to teaching integrity: naming the practices is the first step to avoiding them.
When it does not apply
The concept does not apply as an automatic charge of bad faith: much p-hacking and HARKing occur without intent, through cognitive biases and incentives that reward the positive result. It does not apply to honest exploratory research: exploring data and generating hypotheses is legitimate, as long as it is labeled exploratory and not presented as a confirmatory test. It does not apply as a synonym for any flexible analysis; justified and pre-specified adjustments are not p-hacking. It does not apply to invalidate an entire field: Head and colleagues (2015) showed that, though widespread, p-hacking does not always overturn a meta-analysis’s consensus. And it does not apply without distinguishing the two phenomena: treating p-hacking and HARKing as the same erases the difference between fabricating the number and rewriting the prediction.
Applications by field
- Psychology and social sciences: the origin of the debate and the focus of the replication crisis, where degrees of freedom are wide.
- Biomedicine: trials and observational studies in which multiple outcomes open room for selecting the result.
- Ecology and evolution: the field of Head and colleagues’ study, with p-hacking detectable at scale by text-mining.
- Metascience and integrity: systemic evaluation of literatures and the design of safeguards such as preregistration.
Common pitfalls
The first pitfall is confusing legitimate exploration with fraud: generating hypotheses from data is valid if declared exploratory. The second is assuming p-hacking requires bad intent, when biases and incentives produce it without malice. The third is treating p-hacking and HARKing as identical, losing the distinction between manipulating the analysis and rewriting the hypothesis. The fourth is trusting that a single low p is strong evidence without knowing how many analyses were tried. The fifth is believing that more statistical rigor alone solves it, ignoring that the structural fix runs through preregistration, transparency of analytic decisions, and explicit separation between the confirmatory and the exploratory.