VSD2M: A Large-scale Vision-language Sticker Dataset for Multi-frame Animated Sticker Generation
Zhiqiang Yuan, Jiapei Zhang, Ying Deng, Yeshuang Zhu, Jie Zhou,, Jinchao Zhang

TL;DR
This paper introduces VSD2M, the largest dataset for animated sticker generation, and proposes a new Spatial Temporal Interaction layer to enhance video generation methods for creating animated stickers from text prompts.
Contribution
The paper constructs the first large-scale vision-language sticker dataset VSD2M and develops a novel STI layer to improve animated sticker generation performance.
Findings
VSD2M contains two million static and animated stickers.
The STI layer improves information utilization in video generation.
Baseline models trained on VSD2M show promising results.
Abstract
As a common form of communication in social media,stickers win users' love in the internet scenarios, for their ability to convey emotions in a vivid, cute, and interesting way. People prefer to get an appropriate sticker through retrieval rather than creation for the reason that creating a sticker is time-consuming and relies on rule-based creative tools with limited capabilities. Nowadays, advanced text-to-video algorithms have spawned numerous general video generation systems that allow users to customize high-quality, photo-realistic videos by only providing simple text prompts. However, creating customized animated stickers, which have lower frame rates and more abstract semantics than videos, is greatly hindered by difficulties in data acquisition and incomplete benchmarks. To facilitate the exploration of researchers in animated sticker generation (ASG) field, we firstly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInteractive and Immersive Displays · Augmented Reality Applications · Human Motion and Animation
