Reinforce Adjoint Matching: Scaling Diffusion RL

June 30, 2026
Andreas Bergmeister, TU Munich
Microsoft Research New England Generative Modeling & Sampling Seminar

Diffusion and flow-matching models scale because pretraining is supervised regression: a clean sample is noised analytically, and a model regresses against a closed-form target. RL post-training aligns the model with a reward. In image generation, this makes samples compose objects correctly, render text legibly, and match human preferences. Existing methods rely on costly SDE rollouts, reward gradients, or surrogate losses, sacrificing pretraining’s regression structure. We show that the structure extends to RL post-training. Under KL-regularized reward maximization, the optimal generative process tilts the clean-endpoint distribution towards samples with higher reward and leaves the noising law unchanged. Combining this with the adjoint-matching optimality condition and a REINFORCE identity, we derive Reinforce Adjoint Matching (RAM): a consistency loss that corrects the pretraining target with the reward. At each step, we draw a clean endpoint from the current model, evaluate its reward, noise it as in pretraining, and regress. No SDE rollouts, backward adjoint sweeps, or reward gradients are required. Like the pretraining objective, RAM is simple and scales. On Stable Diffusion 3.5M, RAM achieves the highest reward on composability, text rendering, and human preference, reaching Flow-GRPO’s peak reward in up to 50× fewer training steps.

Speaker bio

Andreas Bergmeister is a PhD student at TU Munich. He received his Bachelor’s and Master’s degrees in Computer Science from ETH Zurich. His research focuses on generative models, diffusion models in particular.

- Andreas Bergmeister
  
  Doctoral Researcher
  
  TU Munich
Research Area
- Artificial intelligence
Research Lab
- Microsoft Research Lab - New England
Event
- Microsoft Research New England Generative Modeling & Sampling Seminar

Series: MSR New England Generative Modeling & Sampling Seminar

Reinforce Adjoint Matching: Scaling Diffusion RL
June 30, 2026
Andreas Bergmeister
Inferring Unobserved Trajectories from Multiple Temporal Snapshots
June 23, 2026
Yunyi Shen & Carles Domingo-Enrich
Rare event analysis via stochastic optimal control
June 16, 2026
Yuanqi Du & Carles Domingo-Enrich
Constrained Generative AI for Materials Inverse Design
June 2, 2026
Mouyang Cheng
De novo Generation for Molecular Structure Elucidation from Mass Spectrometry
May 26, 2026
Runzhong Wang
Designing Dynamic Measure Transport for Sampling
May 19, 2026
Aimee Maurais
Generative AI for High-Stakes Decision-Making with Applications in One Health
May 12, 2026
Lingkai Kong
Physics and information theory of generative diffusion
May 5, 2026
Luca Ambrogioni
Where the Score Lives: What Wavelets Reveal About Diffusion Models
April 28, 2026
Emma Finn
Matching features, not tokens: Energy-based fine-tuning of language models
April 14, 2026
Mujin Kwun,

Carles Domingo-Enrich
Data-Driven Discovery and Verification of Singularities in Nonlinear Partial Differential Equations
April 9, 2026
Yixuan Wang
Tractable Mapping Entropy and Generative Backmapping via Split-Flows
April 7, 2026
Tristan Bereau
Generative Models for Molecular Dynamics Across Timescales
March 31, 2026
Michael Plainer,

Winfried Ripken,

Gregor Lied
A Unified Approach to Analysis and Design of Denoising Markov Models
March 24, 2026
Yinuo Ren
Q-learning with Flow-Matching Policies
March 17, 2026
Qiyang (Colin) Li
Extending measure dynamics beyond generative modeling
March 10, 2026
Jiequn Han
Wavefunction Flows: Efficient Quantum Simulation of Continuous Flow Models
March 3, 2026
David Layden
A non-Markovian approach to diffusion-based sampling
February 24, 2026
Lorenz Richter
Blind denoising diffusion models and the blessings of dimensionality
February 17, 2026
Aram-Alexandre Pooladian
Meta Flow Maps
February 3, 2026
Peter Potaptchik

Your Privacy Choices