InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO

Xueji Fang; Liyuan Ma; Zhiyang Chen; Mingyuan Zhou; Guo-jun Qi

arXiv:2505.17574·cs.CV·May 26, 2025

InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO

Xueji Fang, Liyuan Ma, Zhiyang Chen, Mingyuan Zhou, Guo-jun Qi

PDF

1 Repo

TL;DR

InfLVG is a novel inference-time framework that enables long, coherent video generation by dynamically selecting relevant context tokens, improving consistency and semantic fidelity without additional long-form data.

Contribution

We propose InfLVG, a learnable context selection policy optimized with GRPO, to extend autoregressive text-to-video models for long videos while maintaining quality and consistency.

Findings

01

Extends video length by up to 9 times.

02

Maintains strong cross-scene consistency.

03

Achieves high semantic fidelity across scenes.

Abstract

Recent advances in text-to-video generation, particularly with autoregressive models, have enabled the synthesis of high-quality videos depicting individual scenes. However, extending these models to generate long, cross-scene videos remains a significant challenge. As the context length grows during autoregressive decoding, computational costs rise sharply, and the model's ability to maintain consistency and adhere to evolving textual prompts deteriorates. We introduce InfLVG, an inference-time framework that enables coherent long video generation without requiring additional long-form video data. InfLVG leverages a learnable context selection policy, optimized via Group Relative Policy Optimization (GRPO), to dynamically identify and retain the most semantically relevant context throughout the generation process. Instead of accumulating the entire generation history, the policy ranks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

maple-aigc/inflvg
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training