U-Sticker: A Large-Scale Multi-Domain User Sticker Dataset for Retrieval and Personalization
Heng Er Metilda Chee, Jiayin Wang, Zhiqiang Guo, Weizhi Ma, Qinglang Guo, Min Zhang

TL;DR
U-Sticker is the largest multi-domain user sticker dataset, enabling advanced research in sticker retrieval, personalization, and user behavior modeling with rich temporal and cross-domain data.
Contribution
We introduce U-Sticker, a large-scale, multi-domain dataset capturing temporal and user-specific sticker interactions for improved personalization research.
Findings
Demonstrated practical applications in user behavior modeling.
Showcased effectiveness in personalized sticker recommendation.
Highlighted potential for advancing conversational studies.
Abstract
Instant messaging with texts and stickers has become a widely adopted communication medium, enabling efficient expression of user semantics and emotions. With the increased use of stickers conveying information and feelings, sticker retrieval and recommendation has emerged as an important area of research. However, a major limitation in existing literature has been the lack of datasets capturing temporal and user-specific sticker interactions, which has hindered further progress in user modeling and sticker personalization. To address this, we introduce User-Sticker, a dataset that includes temporal and user anonymous ID across conversations. It is the largest publicly available sticker dataset to date, containing 22K unique users, 370K stickers, and 8.3M messages. The raw data was collected from a popular messaging platform from 67 conversations over 720 hours of crawling. All text and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
