Unleashing the Power of Emojis in Texts via Self-supervised Graph   Pre-Training

Zhou Zhang; Dongzeng Tan; Jiaan Wang; Yilong Chen; Jiarong Xu

arXiv:2409.14552·cs.CL·September 27, 2024

Unleashing the Power of Emojis in Texts via Self-supervised Graph Pre-Training

Zhou Zhang, Dongzeng Tan, Jiaan Wang, Yilong Chen, Jiarong Xu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel self-supervised graph pre-training framework that models the interactions between posts, words, and emojis to enhance social media text understanding.

Contribution

It constructs a heterogeneous graph with posts, words, and emojis and proposes a pre-training method with contrastive and link reconstruction tasks for better emoji-text representation.

Findings

01

Significant improvements on Xiaohongshu and Twitter datasets

02

Effective modeling of emoji-text interactions

03

Outperforms previous baseline methods

Abstract

Emojis have gained immense popularity on social platforms, serving as a common means to supplement or replace text. However, existing data mining approaches generally either completely ignore or simply treat emojis as ordinary Unicode characters, which may limit the model's ability to grasp the rich semantic information in emojis and the interaction between emojis and texts. Thus, it is necessary to release the emoji's power in social media data mining. To this end, we first construct a heterogeneous graph consisting of three types of nodes, i.e. post, word and emoji nodes to improve the representation of different elements in posts. The edges are also well-defined to model how these three elements interact with each other. To facilitate the sharing of information among post, word and emoji nodes, we propose a graph pre-train framework for text and emoji co-modeling, which contains two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ginkoeric/self-supervised-graph-pre-training-for-emoji
pytorchOfficial

Videos

Unleashing the Power of Emojis in Texts via Self-supervised Graph Pre-Training· underline

Taxonomy

TopicsDigital Communication and Language · Natural Language Processing Techniques

MethodsContrastive Learning