This is based my conversation with Claude Sonnet 4.6
The problem
We want to draw samples from a target distribution , but cannot sample from it directly. The normalizing constant of may also be unknown.
Algorithm
Choose a proposal distribution you can sample from, and find a scalar such that:
Then repeat until acceptance:
- Draw candidate
- Draw
- Accept if , else reject
Accepted samples are distributed exactly according to .
Unnormalized densities work
Only is needed — the unknown normalizing constant cancels in the acceptance ratio. This is one of the main practical motivations for rejection sampling.
Intuition
Think of as a ceiling above . Sampling a candidate picks a random point under the ceiling at that location. The acceptance coin is weighted by how much of that vertical bar lies below , so is kept with probability proportional to .
The catch — curse of dimensionality
The acceptance rate is . In high dimensions, even a slightly mismatched proposal causes to grow exponentially: if acceptance rate per dimension is 90%, across 100 dimensions it is . Rejection sampling is practical only in low dimensions.
remains a scalar in the multivariate case — it is one global constant that must cover the entire target over .
Comparison with Importance sampling corrections
Both methods use the ratio , but answer different questions.
| Rejection sampling | Importance sampling | |
|---|---|---|
| Output | Exact iid samples from | Weighted estimate of |
| Requires ? | Yes | No |
| Accepts/rejects? | Yes | No |
| High dimensions | Breaks down (rate → 0) | Degrades gracefully |
| Failure mode | Exponential rejection | Variance blows up |
Importance sampling rewrites the expectation as:
so you draw and average with weights . No rejection, no . Its failure mode is variance blow-up when has lighter tails than , causing a few samples to dominate with huge weights.
Importance sampling is far more common in ML (variational inference, policy gradients, off-policy RL) due to better high-dimensional scaling.