\begin{algorithm}
\caption{Speculative Sampling (SpS) with Auto-Regressive Target and Draft Models}
\begin{algorithmic}
\STATE Given lookahead $K$ and minimum target sequence length $T$.
\STATE Given auto-regressive target model $q(\cdot|\cdot)$, and auto-regressive draft model $p(\cdot|\cdot)$, initial prompt sequence $x_0, \dots, x_t$.
\STATE Initialise $n \leftarrow t$.
\WHILE{$n < T$}
    \FOR{$t = 1$ \TO $K$}
        \STATE Sample draft auto-regressively $\tilde{x}_t \sim p(x|x_1, \dots, x_n, \tilde{x}_1, \dots, \tilde{x}_{t-1})$
    \ENDFOR
    \STATE In parallel, compute $K + 1$ sets of logits from drafts $\tilde{x}_1, \dots, \tilde{x}_K$:
    \STATE $q(x|x_1, \dots, x_n), q(x|x_1, \dots, x_n, \tilde{x}_1), \dots, q(x|x_1, \dots, x_n, \tilde{x}_1, \dots, \tilde{x}_K)$
    \FOR{$t = 1$ \TO $K$}
        \STATE Sample $r \sim U[0, 1]$ from a uniform distribution.
        \IF{$r < \min\left(1, \frac{q(x|x_1, \dots, x_{n+t-1})}{p(x|x_1, \dots, x_{n+t-1})}\right)$}
            \STATE Set $x_{n+t} \leftarrow \tilde{x}_t$ and $n \leftarrow n + 1$.
        \ELSE
            \STATE sample $x_{n+t} \sim (q(x|x_1, \dots, x_{n+t-1}) - p(x|x_1, \dots, x_{n+t-1}))_+$ and exit for loop.
        \ENDIF
    \ENDFOR
    \IF{all tokens $x_{n+1}, \dots, x_{n+K}$ are accepted}
        \STATE sample extra token $x_{n+K+1} \sim q(x|x_1, \dots, x_n, x_{n+K})$ and set $n \leftarrow n + 1$.
    \ENDIF
\ENDWHILE
\end{algorithmic}
\end{algorithm}

This is modified Rejection Sampling.

Why Not Just Sample from on Rejection?

When a draft token is rejected, you already have the full target distribution from the parallel verification pass. Sampling a new token from is free — no extra forward pass. So why bother with the correction?

Because sampling from directly gives the wrong marginal distribution. The marginal probability of outputting token is the sum of two paths:

If you set , this becomes , which is not equal to in general. You double-count: tokens where and both place mass get delivered through the acceptance path and again through the rejection path, skewing the output.

The correction fixes this by subtracting out exactly the mass already delivered via acceptance:

Intuition

Think of as a target you need to “fill.” The acceptance step already delivers of mass for each token. The corrected distribution contains only the unfilled residual — tokens where exceeds , weighted by exactly the deficit. The two paths tile with no overlap.

The normalizing constant equals by conservation of probability (), so the algebra closes:

This is the unique correction that makes the output distribution exactly without requiring any additional model calls.