What Is Flow Matching? Definition & How It Works

Get API key
Table of Contents:

What is flow matching?

Standard diffusion models generate outputs through hundreds of noisy, curved sampling steps. Flow matching cuts that path straight. It is one of the primary reasons LTX-2 generates at 1/5 to 1/10 the compute cost of earlier video generation models.

Definition

Flow matching is a framework for training generative models that learns a vector field: a function that maps a direction at every point in space, guiding a simple noise distribution toward a complex data distribution. Instead of defining a stochastic noising process like diffusion, flow matching defines a deterministic trajectory, or "flow," between noise and data.

The result is a continuous normalizing flow trained using regression on the vector field, rather than the score-matching objectives used in traditional diffusion.

How flow matching differs from diffusion

Both diffusion models and flow matching start with noise and end with data. The difference is in the path.

Diffusion models follow a noising process defined by a stochastic differential equation (SDE). At each timestep, the transition is stochastic, introducing variance. This produces curved, indirect paths from noise to data, requiring many denoising steps to produce coherent outputs.

Flow matching defines an ordinary differential equation (ODE) instead. Trajectories are deterministic and, when designed well, nearly straight. Straight paths mean the model covers the distance from noise to data in fewer steps. Fewer steps means less compute at inference time.

How flow matching works

At training time, for each data sample x, a noise sample z is drawn from a Gaussian distribution. A straight-line path is defined between z and x, and the model is trained to predict the vector field, the direction of travel along that path, at each point in time.

This is a regression problem. The loss is the mean squared error between the predicted vector field and the ground-truth direction at sampled timesteps. No score estimation, no ELBO objectives. Just: predict the right direction.

At inference, you start from random noise, evaluate the trained vector field, and follow it step by step toward a clean sample. Fewer steps are needed because the path is straighter.

Conditional Flow Matching (CFM) extends this to work on conditional distributions, where the vector field is conditioned on a target sample x. This is what makes training tractable at scale.

Types and variants

Conditional Flow Matching (CFM) is the standard formulation. Introduced by Lipman et al. (2022) and Albergo & Vanden-Eijnden (2022) in parallel.

Rectified Flow explicitly trains flows along straight lines between noise and data. Liu et al. (2022). Stable Diffusion 3 and FLUX are built on Rectified Flow.

Optimal Transport CFM (OT-CFM) matches noise to data in a way that minimizes transport cost, producing even straighter paths and improving training efficiency.

Stochastic Interpolants generalize flow matching and diffusion under a common framework. Albergo, Boffi, and Vanden-Eijnden (2023).

A brief history

Flow matching builds on continuous normalizing flows (CNFs) from Chen et al. (2018), which used neural ODEs to model generative processes but were expensive to train. The key insight was that you do not need to simulate the ODE during training. You can train the vector field directly using regression, making the approach as scalable as standard diffusion.

The foundational flow matching paper appeared on arXiv in October 2022. By 2023 it had been adopted in several state-of-the-art image models. By 2024 it had become the dominant training objective for new high-performance generative models across modalities.

Why flow matching matters for video generation

Video is high-dimensional. A single clip contains thousands of times more information than a still image. Every inference step costs real compute, and those costs multiply across frames, across resolutions, and across the number of generations a production pipeline runs per day.

Flow matching reduces those costs structurally. Straighter sampling trajectories mean fewer NFE (Number of Function Evaluations) at inference. This is not a quality compromise. It is an architectural improvement. Models trained with flow matching reach comparable or better output quality in fewer steps than DDPM-trained equivalents.

How LTX-2 uses flow matching

LTX-2.3 is a diffusion transformer trained with a flow matching objective. This is a key part of why it operates at 1/5 to 1/10 the compute cost of comparable models.

At inference, the sampling steps parameter controls how many ODE integration steps the model takes. Because the learned paths are near-straight, LTX-2 produces strong results at step counts that would yield degraded outputs from a traditional DDPM. You can configure this directly in the LTX-2 API, where sampling parameters are fully exposed.