Unleashing the Power of Emojis in Texts via Self-supervised Graph Pre-Training
Zhou Zhang, Dongzeng Tan, Jiaan Wang, Yilong Chen, Jiarong Xu

TL;DR
This paper introduces a novel self-supervised graph pre-training framework that models the interactions between posts, words, and emojis to enhance social media text understanding.
Contribution
It constructs a heterogeneous graph with posts, words, and emojis and proposes a pre-training method with contrastive and link reconstruction tasks for better emoji-text representation.
Findings
Significant improvements on Xiaohongshu and Twitter datasets
Effective modeling of emoji-text interactions
Outperforms previous baseline methods
Abstract
Emojis have gained immense popularity on social platforms, serving as a common means to supplement or replace text. However, existing data mining approaches generally either completely ignore or simply treat emojis as ordinary Unicode characters, which may limit the model's ability to grasp the rich semantic information in emojis and the interaction between emojis and texts. Thus, it is necessary to release the emoji's power in social media data mining. To this end, we first construct a heterogeneous graph consisting of three types of nodes, i.e. post, word and emoji nodes to improve the representation of different elements in posts. The edges are also well-defined to model how these three elements interact with each other. To facilitate the sharing of information among post, word and emoji nodes, we propose a graph pre-train framework for text and emoji co-modeling, which contains two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDigital Communication and Language · Natural Language Processing Techniques
MethodsContrastive Learning
