AnyI2V: Animating Any Conditional Image with Motion Control
Ziye Li, Hao Luo, Xincheng Shuai, Henghui Ding

TL;DR
AnyI2V is a training-free framework that animates diverse conditional images with user-defined motion, enabling flexible, high-quality video generation with various input modalities and editing capabilities.
Contribution
It introduces a novel, training-free method for animating any conditional image with motion control, supporting multiple data types and enabling style transfer and editing.
Findings
Achieves superior performance in motion-controlled video generation
Supports diverse input modalities including meshes and point clouds
Enables style transfer and editing via LoRA and text prompts
Abstract
Recent advancements in video generation, particularly in diffusion models, have driven notable progress in text-to-video (T2V) and image-to-video (I2V) synthesis. However, challenges remain in effectively integrating dynamic motion signals and flexible spatial constraints. Existing T2V methods typically rely on text prompts, which inherently lack precise control over the spatial layout of generated content. In contrast, I2V methods are limited by their dependence on real images, which restricts the editability of the synthesized content. Although some methods incorporate ControlNet to introduce image-based conditioning, they often lack explicit motion control and require computationally expensive training. To address these limitations, we propose AnyI2V, a training-free framework that animates any conditional images with user-defined motion trajectories. AnyI2V supports a broader range…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion
