warming up your workspace

The bootstrap, confidence intervals without the formula

Open a statistics textbook and you'll find a different confidence-interval formula for every situation: one for a mean, another for a median, another for a correlation, each carrying assumptions (normality, large samples) you may quietly be violating. The bootstrap replaces that whole drawer of formulas with a single, almost suspiciously simple idea, and it works for statistics that have no clean formula at all.

The one idea: your sample is your best guess at the population

You have one sample of data and you want to know how much a statistic (say, the mean) would wobble if you could collect new samples. But you can't collect new samples, you have the one.

The bootstrap's move: treat your sample as if it were the population, and draw new samples from it, with replacement. Each "resample" is the same size as your data, but some original points appear twice or thrice and others not at all. Compute your statistic on each resample, do it thousands of times, and the spread of those values estimates how much the statistic would vary in reality. You simulate "collecting new data" by reusing the data you have.

In a few lines of R

bootstrap_ci <- function(x, stat = mean, B = 10000, conf = 0.95) {
  n <- length(x)
  # B resamples, each the same size as x, drawn WITH replacement
  boots <- replicate(B, stat(sample(x, n, replace = TRUE)))
  alpha <- (1 - conf) / 2
  quantile(boots, c(alpha, 1 - alpha))   # the middle 95% of the resample stats
}

set.seed(1)
x <- c(4.1, 5.5, 3.8, 6.0, 5.2, 4.9, 7.1, 3.3, 5.8, 6.4)
bootstrap_ci(x)            # a 95% CI for the mean, no t-distribution in sight

That's the entire method. sample(x, n, replace = TRUE) is the resample; replicate(B, ...) does it ten thousand times; quantile(..., c(0.025, 0.975)) takes the middle 95% of the resulting statistics as the interval. The "percentile bootstrap" confidence interval is literally: the range that holds the central 95% of your resampled statistics.

Three details that matter:

  • replace = TRUE is the whole trick. Sampling without replacement would just return your original data every time. Sampling with replacement creates the variation that mimics drawing fresh samples.
  • Each resample is the same size n. The amount of data you have determines how much a statistic wobbles, so the resamples must match it. A bigger original sample gives a tighter interval, exactly as it should.
  • No distributional assumption. We never assumed the data was normal. The interval comes from the data's own shape, which is why the bootstrap shines on skewed data and small (but not tiny) samples where formula-based intervals quietly fail.

The magic: it works for any statistic

Here's the real payoff. Want a confidence interval for the median? There's no nice textbook formula. With the bootstrap, change one argument:

bootstrap_ci(x, stat = median)

The same for a trimmed mean, a correlation, a ratio, the 90th percentile, anything you can compute. If you can write a function that returns the statistic, the bootstrap gives you its confidence interval. One method, unlimited statistics. That generality is why it became a workhorse of modern applied statistics: you stop hunting for the right formula and just resample and look.

When it doesn't work

Honesty matters. The bootstrap isn't universal:

  • Very small samples. If you have 5 data points, resampling 5 points can't conjure information that isn't there; the interval will be unreliable.
  • Extremes. It struggles with statistics that depend on the rarest values, like the maximum, because a resample can't produce values larger than your observed max.
  • It needs compute. It trades a formula for thousands of recomputations, trivial today, which is exactly why the method became practical only once computers were cheap.

Why this is worth knowing

The bootstrap is a beautiful example of computation replacing cleverness. Instead of deriving a formula with restrictive assumptions, you let the computer simulate the sampling variation directly from your data. Once the idea clicks, resample with replacement, recompute, look at the spread, a huge swath of "which test do I use?" anxiety dissolves: for many questions, you can just bootstrap it.

Building these resampling and simulation methods yourself, rather than calling a black-box t.test, is exactly the spirit of the statistics in R track, where Monte Carlo and the bootstrap show up as tools you construct, not incantations you trust.