Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation
Guy Yariv, Yuval Kirstain, Amit Zohar, Shelly Sheynin, Yaniv Taigman,, Yossi Adi, Sagie Benaim, Adam Polyak

TL;DR
This paper introduces a mask-based motion trajectory representation for image-to-video generation, significantly improving motion accuracy and temporal coherence in multi-object scenarios by decomposing the task into intermediate representation generation and video synthesis.
Contribution
The paper proposes a novel two-stage framework with a mask-based motion trajectory as an intermediate representation, enhancing motion realism and consistency in image-to-video generation.
Findings
Achieves state-of-the-art results in temporal coherence and motion realism.
Introduces a new benchmark for multi-object image-to-video generation.
Demonstrates superior performance on challenging benchmarks.
Abstract
We consider the task of Image-to-Video (I2V) generation, which involves transforming static images into realistic video sequences based on a textual description. While recent advancements produce photorealistic outputs, they frequently struggle to create videos with accurate and consistent object motion, especially in multi-object scenarios. To address these limitations, we propose a two-stage compositional framework that decomposes I2V generation into: (i) An explicit intermediate representation generation stage, followed by (ii) A video generation stage that is conditioned on this representation. Our key innovation is the introduction of a mask-based motion trajectory as an intermediate representation, that captures both semantic object information and motion, enabling an expressive but compact representation of motion and semantics. To incorporate the learned representation in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAugmented Reality Applications
MethodsSoftmax · Attention Is All You Need
