Related: Flow Matching, Score matching.

Vanilla Guidance

We will let the model take another input, prompt , during training. Instead of sampling the output from , we sample both and . For example, in training the model, we always provide text “dog” with a dog image, “cat” with a cat image, and so on.

Classifier Guidance

With some Bayes rule we can separate the to the unguided part and the guided part. For example, with Gaussian probability paths, we can convert the vector field to its score representation…

Next, realize that is a conditional density. Hence, we can use Bayes’ rule to rewrite the guided score as

where we used that the gradient is taken with respect to the variable , so that . We may thus rewrite

Notice the shape of the above equation: The guided vector field is a sum of the unguided vector field plus a gradient of the likelihood of the guidance variable . As people observed that their image did not fit their prompt well enough, it was a natural idea to scale up the contribution of the term, yielding

Well, where do we get the part? Another classifier, thus the name.

Classifier-Free Guidance

Well, now you know why we emphasize classifier-free here. Surprise! We used Bayes’ rule again, and we got again.

How can we double-dip Bayes rule to get a generative model from classifier, or reuse our current generator to do two jobs? Here it goes

Our model can produce both and since it can treat . We’ll hack our label so that with some probability it output this empty thing.