Free$^2$Guide: Training-Free Text-to-Video Alignment using Image LVLM
Jaemin Kim, Bryan Sangwoo Kim, Jong Chul Ye

TL;DR
Free$^2$Guide introduces a training-free, gradient-free framework that leverages large vision-language models to improve text-to-video alignment in diffusion-based video synthesis, without requiring differentiable reward functions.
Contribution
It proposes a novel approach using path integral control principles to enable black-box LVLMs for text-video alignment, bypassing the need for training or differentiable rewards.
Findings
Significantly improves text-to-video alignment quality.
Supports ensembling of multiple reward models.
Operates with minimal computational overhead.
Abstract
Diffusion models have achieved impressive results in generative tasks for text-to-video (T2V) synthesis. However, achieving accurate text alignment in T2V generation remains challenging due to the complex temporal dependencies across frames. Existing reinforcement learning (RL)-based approaches to enhance text alignment often require differentiable reward functions trained for videos, hindering their scalability and applicability. In this paper, we propose \textbf{FreeGuide}, a novel gradient-free and training-free framework for aligning generated videos with text prompts. Specifically, leveraging principles from path integral control, FreeGuide approximates guidance for diffusion models using non-differentiable reward functions, thereby enabling the integration of powerful black-box Large Vision-Language Models (LVLMs) as reward models. To enable image-trained LVLMs to assess…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Advanced Vision and Imaging
MethodsDiffusion
