TeleBoost: A Systematic Alignment Framework for High-Fidelity, Controllable, and Robust Video Generation

Yuanzhi Liang; Xuan'er Wu; Yirui Liu; Yijie Fang; Yizhen Fan; Ke Hao; Rui Li; Ruiying Liu; Ziqi Ni; Peng Yu; Yanbo Wang; Haibin Huang; Qizhen Weng; Chi Zhang; Xuelong Li

arXiv:2602.07595·cs.CV·February 10, 2026

TeleBoost: A Systematic Alignment Framework for High-Fidelity, Controllable, and Robust Video Generation

Yuanzhi Liang, Xuan'er Wu, Yirui Liu, Yijie Fang, Yizhen Fan, Ke Hao, Rui Li, Ruiying Liu, Ziqi Ni, Peng Yu, Yanbo Wang, Haibin Huang, Qizhen Weng, Chi Zhang, Xuelong Li

PDF

Open Access

TL;DR

TeleBoost introduces a comprehensive post-training framework that enhances the fidelity, controllability, and robustness of video generators by integrating policy shaping, reinforcement learning, and refinement under stability constraints.

Contribution

It presents a unified, staged optimization framework for post-training video generation models, addressing practical constraints and improving stability and performance.

Findings

01

Improved perceptual fidelity and temporal coherence.

02

Enhanced controllability and adherence to prompts.

03

Stable and scalable post-training pipeline.

Abstract

Post-training is the decisive step for converting a pretrained video generator into a production-oriented model that is instruction-following, controllable, and robust over long temporal horizons. This report presents a systematical post-training framework that organizes supervised policy shaping, reward-driven reinforcement learning, and preference-based refinement into a single stability-constrained optimization stack. The framework is designed around practical video-generation constraints, including high rollout cost, temporally compounding failure modes, and feedback that is heterogeneous, uncertain, and often weakly discriminative. By treating optimization as a staged, diagnostic-driven process rather than a collection of isolated tricks, the report summarizes a cohesive recipe for improving perceptual fidelity, temporal coherence, and prompt adherence while preserving the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Reinforcement Learning in Robotics · Human Motion and Animation