This is based my conversation with Claude Sonnet 4.6


The problem

We want to draw samples from a target distribution , but cannot sample from it directly. The normalizing constant of may also be unknown.

Algorithm

Choose a proposal distribution you can sample from, and find a scalar such that:

Then repeat until acceptance:

  1. Draw candidate
  2. Draw
  3. Accept if , else reject

Accepted samples are distributed exactly according to .

Unnormalized densities work

Only is needed — the unknown normalizing constant cancels in the acceptance ratio. This is one of the main practical motivations for rejection sampling.

Intuition

Think of as a ceiling above . Sampling a candidate picks a random point under the ceiling at that location. The acceptance coin is weighted by how much of that vertical bar lies below , so is kept with probability proportional to .

The catch — curse of dimensionality

The acceptance rate is . In high dimensions, even a slightly mismatched proposal causes to grow exponentially: if acceptance rate per dimension is 90%, across 100 dimensions it is . Rejection sampling is practical only in low dimensions.

remains a scalar in the multivariate case — it is one global constant that must cover the entire target over .

Comparison with Importance sampling corrections

Both methods use the ratio , but answer different questions.

Rejection samplingImportance sampling
OutputExact iid samples from Weighted estimate of
Requires ?YesNo
Accepts/rejects?YesNo
High dimensionsBreaks down (rate → 0)Degrades gracefully
Failure modeExponential rejectionVariance blows up

Importance sampling rewrites the expectation as:

so you draw and average with weights . No rejection, no . Its failure mode is variance blow-up when has lighter tails than , causing a few samples to dominate with huge weights.

Importance sampling is far more common in ML (variational inference, policy gradients, off-policy RL) due to better high-dimensional scaling.