TITAN-Guide: Taming Inference-Time AligNment for Guided Text-to-Video Diffusion Models

Christian Simon; Masato Ishii; Akio Hayakawa; Zhi Zhong; Shusuke Takahashi; Takashi Shibuya; Yuki Mitsufuji

arXiv:2508.00289·cs.CV·August 4, 2025

TITAN-Guide: Taming Inference-Time AligNment for Guided Text-to-Video Diffusion Models

Christian Simon, Masato Ishii, Akio Hayakawa, Zhi Zhong, Shusuke Takahashi, Takashi Shibuya, Yuki Mitsufuji

PDF

Open Access

TL;DR

TITAN-Guide introduces an efficient, memory-friendly method for guiding text-to-video diffusion models during inference, improving control and performance without extensive fine-tuning or high memory costs.

Contribution

The paper proposes TITAN-Guide, a novel inference-time alignment method that optimizes diffusion latents efficiently without backpropagation, enhancing control and reducing memory usage in text-to-video diffusion.

Findings

01

Efficient memory management during latent optimization.

02

Significant performance improvements in guided T2V tasks.

03

Outperforms existing guidance methods in benchmarks.

Abstract

In the recent development of conditional diffusion models still require heavy supervised fine-tuning for performing control on a category of tasks. Training-free conditioning via guidance with off-the-shelf models is a favorable alternative to avoid further fine-tuning on the base model. However, the existing training-free guidance frameworks either have heavy memory requirements or offer sub-optimal control due to rough estimation. These shortcomings limit the applicability to control diffusion models that require intense computation, such as Text-to-Video (T2V) diffusion models. In this work, we propose Taming Inference Time Alignment for Guided Text-to-Video Diffusion Model, so-called TITAN-Guide, which overcomes memory space issues, and provides more optimal control in the guidance process compared to the counterparts. In particular, we develop an efficient method for optimizing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Domain Adaptation and Few-Shot Learning