TL;DR
This paper develops and evaluates emoji embedding models based on semantic descriptions to measure emoji similarity, creating a new dataset and demonstrating improved performance in sentiment analysis tasks.
Contribution
It introduces a comprehensive semantic similarity measure for emoji using embedding models trained on emoji descriptions and creates the EmoSim508 dataset for evaluation.
Findings
Emoji embeddings outperform previous models in sentiment analysis
The EmoSim508 dataset provides a new benchmark for emoji similarity
Semantic-based emoji similarity models enhance text processing tasks
Abstract
Emoji have grown to become one of the most important forms of communication on the web. With its widespread use, measuring the similarity of emoji has become an important problem for contemporary text processing since it lies at the heart of sentiment analysis, search, and interface design tasks. This paper presents a comprehensive analysis of the semantic similarity of emoji through embedding models that are learned over machine-readable emoji meanings in the EmojiNet knowledge base. Using emoji descriptions, emoji sense labels and emoji sense definitions, and with different training corpora obtained from Twitter and Google News, we develop and test multiple embedding models to measure emoji similarity. To evaluate our work, we create a new dataset called EmoSim508, which assigns human-annotated semantic similarity scores to a set of 508 carefully selected emoji pairs. After validation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
