VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
Yuanpeng Tu, Hao Luo, Xi Chen, Sihui Ji, Xiang Bai, Hengshuang Zhao

TL;DR
VideoAnydoor is a zero-shot framework for high-fidelity video object insertion that preserves appearance details and offers precise motion control through a pixel warper and a novel training strategy.
Contribution
It introduces a novel pixel warper and training approach enabling detailed appearance preservation and fine-grained motion control in video object insertion.
Findings
Outperforms existing methods in insertion quality
Supports various downstream applications without fine-tuning
Enables precise motion manipulation with key-point trajectories
Abstract
Despite significant advancements in video generation, inserting a given object into videos remains a challenging task. The difficulty lies in preserving the appearance details of the reference object and accurately modeling coherent motions at the same time. In this paper, we propose VideoAnydoor, a zero-shot video object insertion framework with high-fidelity detail preservation and precise motion control. Starting from a text-to-video model, we utilize an ID extractor to inject the global identity and leverage a box sequence to control the overall motion. To preserve the detailed appearance and meanwhile support fine-grained motion control, we design a pixel warper. It takes the reference image with arbitrary key-points and the corresponding key-point trajectories as inputs. It warps the pixel details according to the trajectories and fuses the warped features with the diffusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Video Coding and Compression Technologies
MethodsMax Pooling · Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · U-Net · Diffusion
