U-Sticker: A Large-Scale Multi-Domain User Sticker Dataset for Retrieval and Personalization

Heng Er Metilda Chee; Jiayin Wang; Zhiqiang Guo; Weizhi Ma; Qinglang Guo; Min Zhang

arXiv:2502.19108·cs.IR·July 11, 2025

U-Sticker: A Large-Scale Multi-Domain User Sticker Dataset for Retrieval and Personalization

Heng Er Metilda Chee, Jiayin Wang, Zhiqiang Guo, Weizhi Ma, Qinglang Guo, Min Zhang

PDF

1 Datasets

TL;DR

U-Sticker is the largest multi-domain user sticker dataset, enabling advanced research in sticker retrieval, personalization, and user behavior modeling with rich temporal and cross-domain data.

Contribution

We introduce U-Sticker, a large-scale, multi-domain dataset capturing temporal and user-specific sticker interactions for improved personalization research.

Findings

01

Demonstrated practical applications in user behavior modeling.

02

Showcased effectiveness in personalized sticker recommendation.

03

Highlighted potential for advancing conversational studies.

Abstract

Instant messaging with texts and stickers has become a widely adopted communication medium, enabling efficient expression of user semantics and emotions. With the increased use of stickers conveying information and feelings, sticker retrieval and recommendation has emerged as an important area of research. However, a major limitation in existing literature has been the lack of datasets capturing temporal and user-specific sticker interactions, which has hindered further progress in user modeling and sticker personalization. To address this, we introduce User-Sticker, a dataset that includes temporal and user anonymous ID across conversations. It is the largest publicly available sticker dataset to date, containing 22K unique users, 370K stickers, and 8.3M messages. The raw data was collected from a popular messaging platform from 67 conversations over 720 hours of crawling. All text and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

metchee/u-sticker
dataset· 896 dl
896 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.