Flow Matching

Source

This is all from course material of Introduction to Flow Matching and Diffusion Models. MIT Computer Science Class 6.S184: Generative AI with Stochastic Differential Equations.

Class website: https://diffusion.csail.mit.edu/2026/index.html
Class notes: flow_matching_diffusion_notes.pdf. I did the 2025 version and the note is from 2026. I think the only new thing is the CTMC models.
Appreciate the course as all the slides and recordings are released with lab and solutions

Take the note as the ground truth, here I’ll just list out some brief idea for quick overview.

The goal is to go from $x \sim p_{ini t}$ to $z \sim p_{d a t a}$ . There are several ways to do that. If we try to predict directly, that will be GAN. The problem is that the training is unstable because the reward is sparse.

So imagine for example, if we want to convert a pile of dirt into a mountain, someone just telling you that you missed or hit it is not enough. Flow matching is basically saying, “Why don’t we provide guidance in the process?” so the whole thing is easier to fit.

We could say that the flow-match algorithm basically converts a generation problem into a supervised learning problem; you are essentially supervised on the given velocity path that we know would work.

There is another critical reason why the flow-matching algorithm is so great: it only needs several trajectories that could work. It doesn’t require a single, definitive trajectory; it can simply generate a multi-hypothesis output because of a very neat mathematical property:

A flow model is then described by the ODE

X_{0} \frac{d}{d t} X_{t} \sim p_{init} = u_{t}^{θ} (X_{t}) ▶ random initialization ▶ ODE

Our goal is to make the endpoint $X_{1}$ of the trajectory have distribution $p_{data}$ , i.e.

X_{1} \sim p_{data} \Leftrightarrow ψ_{1}^{θ} (X_{0}) \sim p_{data}

Conditional flow matching loss and marginal flow matching loss differ only by a constant (w.r.t. θ), so they have the same minimizer.

So, what one would do is generate training data. We generate it in a supervised learning way, which is basically sampling here, because you just need to sample:

Your z
Your time step
Your noise (Gaussian noise or something)
Calculate $x$ based on these. E.g. $x = t z + (1 - t) ϵ$ .

We just need to make sure the probability path we choose converge to $z$ in the end and is $x_{0}$ in the beginning.

We can derive the expected vector field $u_{t}$ easily, provided that we choose a path. For example: let $p_{t} (\cdot ∣ z) = N (α_{t} z, β_{t}^{2} I_{d})$ for noise schedulers $α_{t}, β_{t}$ . Let $\overset{α}{˙}_{t} = \partial_{t} α_{t}$ and $\dot{β}_{t} = \partial_{t} β_{t}$ denote respective time derivatives of $α_{t}$ and $β_{t}$ . The conditional Gaussian vector field is given by

u_{t}^{target} (x ∣ z) = (\overset{α}{˙}_{t} - \frac{β ˙ _{t}}{β _{t}} α_{t}) z + \frac{β ˙ _{t}}{β _{t}} x

This can be derived on demand: The conditional flow model is $ψ_{t}^{target} (x ∣ z) = α_{t} z + β_{t} ϵ$ . You just do derivative on $t$ to the components, and replace $ϵ$ with $x$ and $z$ since the value of that is determined since we already sampled $x$ . Or you can just say it’s $\overset{α_{t}}{˙} z + \dot{β_{t}} ϵ$ .

\begin{algorithm}
\caption{Flow Matching Training Procedure (for Gaussian CondOT path $p_t(x|z) = \mathcal{N}(tz, (1 - t)^2)$)}
\begin{algorithmic}
\REQUIRE A dataset of samples $z \sim p_{\text{data}}$, neural network $u_t^\theta$
\FOR{each mini-batch of data}
    \STATE Sample a data example $z$ from the dataset.
    \STATE Sample a random time $t \sim \text{Unif}_{[0,1]}$.
    \STATE Sample noise $\epsilon \sim \mathcal{N}(0, I_d)$
    \STATE Set
    \[
    x = tz + (1 - t)\epsilon \quad \quad \text{(General case: } x \sim p_t(\cdot | z)\text{)}
    \]
    \STATE Compute loss
    \[
    \mathcal{L}(\theta) = \|u_t^\theta(x) - (z - \epsilon)\|^2 \quad \quad \text{(General case: } = \|u_t^\theta(x) - u_t^{\text{target}}(x|z)\|^2\text{)}
    \]
    \STATE Update $\theta \leftarrow \text{grad\_update}(\mathcal{L}(\theta))$.
\ENDFOR
\end{algorithmic}
\end{algorithm}

\begin{algorithm}
\caption{Sampling from a Flow Model with Euler method}
\begin{algorithmic}
\REQUIRE Neural network vector field $u_t^\theta$, number of steps $n$
\STATE Set $t = 0$
\STATE Set step size $h = \frac{1}{n}$
\STATE Draw a sample $X_0 \sim p_{\text{init}}$ \COMMENT{Random initialization!}
\FOR{$i = 1, \dots, n-1$}
    \STATE $X_{t+h} = X_t + h u_t^\theta(X_t)$
    \STATE Update $t \leftarrow t + h$
\ENDFOR
\RETURN $X_1$ \COMMENT{Return final point}
\end{algorithmic}
\end{algorithm}

Yanda's Random Notes

Explorer

Flow Matching

Source

Graph View

Backlinks