Assessing Emoji Use in Modern Text Processing Tools

Abu Awal Md Shoeb; Gerard de Melo

arXiv:2101.00430·cs.CL·January 5, 2021

Assessing Emoji Use in Modern Text Processing Tools

Abu Awal Md Shoeb, Gerard de Melo

PDF

1 Repo

TL;DR

This paper evaluates how well current NLP and text processing tools handle emojis in digital communication, revealing significant shortcomings in their ability to process emoji-containing text accurately.

Contribution

It provides a comprehensive assessment of existing tools' performance on emoji-rich text, highlighting areas needing improvement for better emoji integration.

Findings

01

Many tools struggle with emoji tokenization

02

Part-of-speech tagging accuracy drops with emojis

03

Sentiment analysis is often unreliable on emoji-laden text

Abstract

Emojis have become ubiquitous in digital communication, due to their visual appeal as well as their ability to vividly convey human emotion, among other factors. The growing prominence of emojis in social media and other instant messaging also leads to an increased need for systems and tools to operate on text containing emojis. In this study, we assess this support by considering test sets of tweets with emojis, based on which we perform a series of experiments investigating the ability of prominent NLP and text processing tools to adequately process them. In particular, we consider tokenization, part-of-speech tagging, as well as sentiment analysis. Our findings show that many tools still have notable shortcomings when operating on text containing emojis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

abushoeb/Emoji-Test-Suite
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.