DATA & STATISTICS

Bootstrap

Family of resampling-with-replacement methods that estimates the sampling distribution of an estimator from a single sample. Proposed by Efron (1979). Enables CIs and hypothesis tests without parametric normality assumptions.

Extended definition

Bootstrap is a family of resampling-with-replacement methods that estimates the sampling distribution of an estimator θ^\hat{\theta} from a single observed sample. The idea is to treat the sample as a “population” and simulate the sampling process repeatedly. The basic procedure:

θ^b=θ^(Xb),Xbsampling with replacement from X,b=1,,B\hat{\theta}^*_b = \hat{\theta}(X^*_b), \quad X^*_b \sim \text{sampling with replacement from } X, \quad b = 1, \ldots, B

where XbX^*_b is the bb-th bootstrap sample (same nn as the original sample, with replacement) and θ^b\hat{\theta}^*_b is the estimator computed on that sample. Typically BB is large (1,000–10,000). The empirical distribution of {θ^1,,θ^B}\{\hat{\theta}^*_1, \ldots, \hat{\theta}^*_B\} approximates the sampling distribution of θ^\hat{\theta}, enabling CI construction (percentile, BCa, basic) and hypothesis tests without parametric assumptions. Efron (1979) introduced the method as a reformulation of the jackknife; Efron and Tibshirani (1993) consolidated the canonical technical treatment in book form. Variants include parametric bootstrap (samples from a fitted parametric model), block bootstrap (time series), and residual bootstrap (regression).

When it applies

Bootstrap applies when the sampling distribution of the estimator is hard to derive analytically — coefficients of complex models, robust statistics (median, quantiles), ML metrics, structural parameters in SEM, composite indicators. It is standard for median CIs, ratio CIs (whose distribution is skewed), and SEM fit-measure CIs. In ML, bootstrap (and the specific .632+ variant) is an alternative to CV for estimating performance. In meta-analysis, bootstrap provides CIs for summary measures when parametric assumptions are doubtful. In small samples, BCa-stabilized percentile bootstrap is often preferable to classical parametric CI.

When it does not apply

It does not apply to time series without adjustment — independent resampling breaks autocorrelation structure; block bootstrap is the alternative. It does not apply to populations with extremely heavy tails where the mean has no finite variance — bootstrap samples are unstable. It does not apply as a substitute for increasing nn: bootstrap does not create new information; it estimates the precision of existing information. In regression with few influential points, bootstrap can underestimate uncertainty — good practice is to examine outlier diagnostics before trusting the CI. It does not apply to parameters not identified by the model: bootstrap inherits the identifiability limits of the underlying estimator.

Applications by field

Health and biomedical sciences: bootstrap CIs for ratios, adjusted relative risks, summary measures in meta-analysis. — Econometrics: wild bootstrap for inference in models with unmodeled heteroscedasticity. — Applied ML: bagging (bootstrap aggregating) is the basis of Random Forest; bootstrap for predictive uncertainty. — Psychometrics: bootstrap CIs for SEM coefficients and fit indices (CFI, RMSEA).

Common pitfalls

The first pitfall is interpreting bootstrap results from a biased original sample — bootstrap estimates the precision of the estimate, not corrects systematic bias. If the sample is non-representative, bootstrap inherits the bias. The second is using standard percentile bootstrap in extremely asymmetric distributions — BCa (bias-corrected and accelerated) is the appropriate correction. The third is small BB: B=100B = 100 is insufficient for CI; B1000B \geq 1000 is the minimum, B10,000B \geq 10,000 recommended for tail precision. The fourth is confusing bootstrap with permutation — permutation tests the null hypothesis by relabeling; bootstrap estimates the sampling distribution. The fifth is using bootstrap on grouped data (cluster sampling, repeated measures) without cluster bootstrap — resampling must respect dependence structure or the CI is invalid.

Last updated —