Single-step Diffusion-based Video Coding with Semantic-Temporal Guidance

Naifu Xue; Zhaoyang Jia; Jiahao Li; Bin Li; Zihan Zheng; Yuan Zhang; Yan Lu

arXiv:2512.07480·cs.CV·December 9, 2025

Single-step Diffusion-based Video Coding with Semantic-Temporal Guidance

Naifu Xue, Zhaoyang Jia, Jiahao Li, Bin Li, Zihan Zheng, Yuan Zhang, Yan Lu

PDF

Open Access

TL;DR

S2VC introduces a single-step diffusion-based video codec that achieves high perceptual quality and significant bitrate savings by using semantic and temporal guidance, reducing complexity compared to existing methods.

Contribution

The paper presents S2VC, a novel single-step diffusion video codec with semantic and temporal guidance, enabling efficient, realistic low-bitrate video reconstruction.

Findings

01

52.73% bitrate saving over prior methods

02

State-of-the-art perceptual quality at low bitrates

03

Effective semantic and temporal conditioning improves realism

Abstract

While traditional and neural video codecs (NVCs) have achieved remarkable rate-distortion performance, improving perceptual quality at low bitrates remains challenging. Some NVCs incorporate perceptual or adversarial objectives but still suffer from artifacts due to limited generation capacity, whereas others leverage pretrained diffusion models to improve quality at the cost of heavy sampling complexity. To overcome these challenges, we propose S2VC, a Single-Step diffusion based Video Codec that integrates a conditional coding framework with an efficient single-step diffusion generator, enabling realistic reconstruction at low bitrates with reduced sampling cost. Recognizing the importance of semantic conditioning in single-step diffusion, we introduce Contextual Semantic Guidance to extract frame-adaptive semantics from buffered features. It replaces text captions with efficient,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Image and Video Quality Assessment