TITAN-Guide: Taming Inference-Time AligNment for Guided Text-to-Video Diffusion Models
Christian Simon, Masato Ishii, Akio Hayakawa, Zhi Zhong, Shusuke Takahashi, Takashi Shibuya, Yuki Mitsufuji

TL;DR
TITAN-Guide introduces an efficient, memory-friendly method for guiding text-to-video diffusion models during inference, improving control and performance without extensive fine-tuning or high memory costs.
Contribution
The paper proposes TITAN-Guide, a novel inference-time alignment method that optimizes diffusion latents efficiently without backpropagation, enhancing control and reducing memory usage in text-to-video diffusion.
Findings
Efficient memory management during latent optimization.
Significant performance improvements in guided T2V tasks.
Outperforms existing guidance methods in benchmarks.
Abstract
In the recent development of conditional diffusion models still require heavy supervised fine-tuning for performing control on a category of tasks. Training-free conditioning via guidance with off-the-shelf models is a favorable alternative to avoid further fine-tuning on the base model. However, the existing training-free guidance frameworks either have heavy memory requirements or offer sub-optimal control due to rough estimation. These shortcomings limit the applicability to control diffusion models that require intense computation, such as Text-to-Video (T2V) diffusion models. In this work, we propose Taming Inference Time Alignment for Guided Text-to-Video Diffusion Model, so-called TITAN-Guide, which overcomes memory space issues, and provides more optimal control in the guidance process compared to the counterparts. In particular, we develop an efficient method for optimizing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Domain Adaptation and Few-Shot Learning
