TL;DR
This paper evaluates how well current NLP and text processing tools handle emojis in digital communication, revealing significant shortcomings in their ability to process emoji-containing text accurately.
Contribution
It provides a comprehensive assessment of existing tools' performance on emoji-rich text, highlighting areas needing improvement for better emoji integration.
Findings
Many tools struggle with emoji tokenization
Part-of-speech tagging accuracy drops with emojis
Sentiment analysis is often unreliable on emoji-laden text
Abstract
Emojis have become ubiquitous in digital communication, due to their visual appeal as well as their ability to vividly convey human emotion, among other factors. The growing prominence of emojis in social media and other instant messaging also leads to an increased need for systems and tools to operate on text containing emojis. In this study, we assess this support by considering test sets of tweets with emojis, based on which we perform a series of experiments investigating the ability of prominent NLP and text processing tools to adequately process them. In particular, we consider tokenization, part-of-speech tagging, as well as sentiment analysis. Our findings show that many tools still have notable shortcomings when operating on text containing emojis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
