Emoji Retrieval from Gibberish or Garbled Social Media Text: A Novel Methodology and A Case Study
Shuqi Cui, Nirmalya Thakur, and Audrey Poon

TL;DR
This paper introduces a novel three-step reverse-engineering method to accurately retrieve emojis from noisy social media texts, enhancing data analysis and understanding of emoji usage in garbled data.
Contribution
The paper presents a new methodology specifically designed to recover emojis from garbled social media text, addressing a gap in existing preprocessing techniques.
Findings
Successfully retrieved 157,748 emojis from 76,914 Tweets
Improved readability metrics demonstrate enhanced text coherence after retrieval
Analyzed emoji usage patterns and frequency in social media data
Abstract
Emojis are widely used across social media platforms but are often lost in noisy or garbled text, posing challenges for data analysis and machine learning. Conventional preprocessing approaches recommend removing such text, risking the loss of emojis and their contextual meaning. This paper proposes a three-step reverse-engineering methodology to retrieve emojis from garbled text in social media posts. The methodology also identifies reasons for the generation of such text during social media data mining. To evaluate its effectiveness, the approach was applied to 509,248 Tweets about the Mpox outbreak, a dataset referenced in about 30 prior works that failed to retrieve emojis from garbled text. Our method retrieved 157,748 emojis from 76,914 Tweets. Improvements in text readability and coherence were demonstrated through metrics such as Flesch Reading Ease, Flesch-Kincaid Grade Level,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Digital Communication and Language · Translation Studies and Practices
