Follow-Your-Click: Open-domain Regional Image Animation via Short   Prompts

Yue Ma; Yingqing He; Hongfa Wang; Andong Wang; Chenyang Qi; Chengfei; Cai; Xiu Li; Zhifeng Li; Heung-Yeung Shum; Wei Liu; and Qifeng Chen

arXiv:2403.08268·cs.CV·March 14, 2024·1 cites

Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts

Yue Ma, Yingqing He, Hongfa Wang, Andong Wang, Chenyang Qi, Chengfei, Cai, Xiu Li, Zhifeng Li, Heung-Yeung Shum, Wei Liu, and Qifeng Chen

PDF

Open Access 1 Repo

TL;DR

Follow-Your-Click introduces a user-friendly image-to-video generation framework that allows precise local object control using simple clicks and short prompts, improving quality and controllability over existing methods.

Contribution

The paper presents the first-frame masking strategy, a motion-augmented module with a short prompt dataset, and flow-based motion control, enhancing local control and generation quality in image animation.

Findings

01

Outperforms 7 baselines on 8 metrics

02

Achieves better control and quality than previous methods

03

Enables simple user interaction for precise local animation

Abstract

Despite recent advances in image-to-video generation, better controllability and local animation are less explored. Most existing image-to-video methods are not locally aware and tend to move the entire scene. However, human artists may need to control the movement of different objects or regions. Additionally, current I2V methods require users not only to describe the target motion but also to provide redundant detailed descriptions of frame contents. These two issues hinder the practical utilization of current I2V tools. In this paper, we propose a practical framework, named Follow-Your-Click, to achieve image animation with a simple user click (for specifying what to move) and a short motion prompt (for specifying how to move). Technically, we propose the first-frame masking strategy, which significantly improves the video generation quality, and a motion-augmented module equipped…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mayuelala/followyourclick
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Attentive Walk-Aggregating Graph Neural Network