MotiF: Making Text Count in Image Animation with Motion Focal Loss
Shijie Wang, Samaneh Azadi, Rohit Girdhar, Saketh Rambhatla, Chen Sun,, Xi Yin

TL;DR
MotiF introduces a motion focal loss that emphasizes motion regions in text-guided image animation, significantly enhancing text alignment and motion accuracy in generated videos.
Contribution
The paper proposes MotiF, a novel motion focal loss using optical flow to improve text-guided video generation, and introduces TI2V Bench for robust evaluation.
Findings
MotiF outperforms nine open-source models with 72% preference in human evaluations.
The motion focal loss improves alignment with text prompts and motion realism.
TI2V Bench provides a new dataset for comprehensive TI2V model assessment.
Abstract
Text-Image-to-Video (TI2V) generation aims to generate a video from an image following a text description, which is also referred to as text-guided image animation. Most existing methods struggle to generate videos that align well with the text prompts, particularly when motion is specified. To overcome this limitation, we introduce MotiF, a simple yet effective approach that directs the model's learning to the regions with more motion, thereby improving the text alignment and motion generation. We use optical flow to generate a motion heatmap and weight the loss according to the intensity of the motion. This modified objective leads to noticeable improvements and complements existing methods that utilize motion priors as model inputs. Additionally, due to the lack of a diverse benchmark for evaluating TI2V generation, we propose TI2V Bench, a dataset consists of 320 image-text pairs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Handwritten Text Recognition Techniques · Video Analysis and Summarization
MethodsHeatmap · ALIGN
