LibraGen: Playing a Balance Game in Subject-Driven Video Generation

Jiahao Zhu; Shanshan Lao; Lijie Liu; Gen Li; Tianhao Qi; Wei Han; Bingchuan Li; Fangfang Liu; Zhuowei Chen; Tianxiang Ma; Qian HE; Yi Zhou; Xiaohua Xie

arXiv:2603.13506·cs.CV·March 18, 2026

LibraGen: Playing a Balance Game in Subject-Driven Video Generation

Jiahao Zhu, Shanshan Lao, Lijie Liu, Gen Li, Tianhao Qi, Wei Han, Bingchuan Li, Fangfang Liu, Zhuowei Chen, Tianxiang Ma, Qian HE, Yi Zhou, Xiaohua Xie

PDF

Open Access

TL;DR

LibraGen introduces a balanced approach to subject-driven video generation by harmonizing intrinsic model priors with new S2V capabilities through data quality emphasis, post-training tuning, and dynamic guidance, achieving superior results.

Contribution

The paper presents LibraGen, a novel framework that balances foundation model strengths and S2V capabilities via a quality-focused pipeline, post-training tuning, and dynamic inference control.

Findings

01

Outperforms existing S2V models with limited data

02

Effective balance between motion coherence and prompt alignment

03

Demonstrates superior qualitative and quantitative results

Abstract

With the advancement of video generation foundation models (VGFMs), customized generation, particularly subject-to-video (S2V), has attracted growing attention. However, a key challenge lies in balancing the intrinsic priors of a VGFM, such as motion coherence, visual aesthetics, and prompt alignment, with its newly derived S2V capability. Existing methods often neglect this balance by enhancing one aspect at the expense of others. To address this, we propose LibraGen, a novel framework that views extending foundation models for S2V generation as a balance game between intrinsic VGFM strengths and S2V capability. Specifically, guided by the core philosophy of "Raising the Fulcrum, Tuning to Balance," we identify data quality as the fulcrum and advocate a quality-over-quantity approach. We construct a hybrid pipeline that combines automated and manual data filtering to improve overall…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · 3D Shape Modeling and Analysis