Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation

Xingtong Ge; Yi Zhang; Yushi Huang; Dailan He; Xiahong Wang; Bingqi Ma; Guanglu Song; Yu Liu; Jun Zhang

arXiv:2604.03118·cs.CV·April 6, 2026

Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation

Xingtong Ge, Yi Zhang, Yushi Huang, Dailan He, Xiahong Wang, Bingqi Ma, Guanglu Song, Yu Liu, Jun Zhang

PDF

1 Repo

TL;DR

Salt introduces a novel training method for fast, high-quality real-time video generation by regularizing denoising processes and leveraging cache-aware training, improving output quality at low computational budgets.

Contribution

The paper proposes Self-Consistent Distribution Matching Distillation (SC-DMD) and cache-conditioned training to enhance low-NFE video generation quality across various models.

Findings

01

Improved video quality at low inference budgets across multiple backbones.

02

Effective regularization of denoising composition to prevent drift.

03

Compatibility with diverse cache memory mechanisms.

Abstract

Distilling video generation models to extremely low inference budgets (e.g., 2--4 NFEs) is crucial for real-time deployment, yet remains challenging. Trajectory-style consistency distillation often becomes conservative under complex video dynamics, yielding an over-smoothed appearance and weak motion. Distribution matching distillation (DMD) can recover sharp, mode-seeking samples, but its local training signals do not explicitly regularize how denoising updates compose across timesteps, making composed rollouts prone to drift. To overcome this challenge, we propose Self-Consistent Distribution Matching Distillation (SC-DMD), which explicitly regularizes the endpoint-consistent composition of consecutive denoising updates. For real-time autoregressive video generation, we further treat the KV cache as a quality parameterized condition and propose Cache-Distribution-Aware training. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

XingtongGe/Salt
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.