Latent Temporal Discrepancy as Motion Prior: A Loss-Weighting Strategy for Dynamic Fidelity in T2V

Meiqi Wu; Bingze Song; Ruimin Lin; Chen Zhu; Xiaokun Feng; Jiahong Wu; Xiangxiang Chu; Kaiqi Huang

arXiv:2601.20504·cs.CV·January 29, 2026

Latent Temporal Discrepancy as Motion Prior: A Loss-Weighting Strategy for Dynamic Fidelity in T2V

Meiqi Wu, Bingze Song, Ruimin Lin, Chen Zhu, Xiaokun Feng, Jiahong Wu, Xiangxiang Chu, Kaiqi Huang

PDF

Open Access

TL;DR

This paper introduces Latent Temporal Discrepancy (LTD), a motion prior that adaptively weights loss based on frame-to-frame latent variations, significantly improving dynamic fidelity in video generation models.

Contribution

The paper proposes LTD as a novel motion prior that guides loss weighting in diffusion models, enhancing their ability to generate high-quality dynamic videos.

Findings

01

Outperforms baselines by 3.31% on VBench

02

Outperforms baselines by 3.58% on VMBench

03

Improves motion quality in dynamic video generation

Abstract

Video generation models have achieved notable progress in static scenarios, yet their performance in motion video generation remains limited, with quality degrading under drastic dynamic changes. This is due to noise disrupting temporal coherence and increasing the difficulty of learning dynamic regions. {Unfortunately, existing diffusion models rely on static loss for all scenarios, constraining their ability to capture complex dynamics.} To address this issue, we introduce Latent Temporal Discrepancy (LTD) as a motion prior to guide loss weighting. LTD measures frame-to-frame variation in the latent space, assigning larger penalties to regions with higher discrepancy while maintaining regular optimization for stable regions. This motion-aware strategy stabilizes training and enables the model to better reconstruct high-frequency dynamics. Extensive experiments on the general benchmark…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Human Pose and Action Recognition