FeelsGoodMan: Inferring Semantics of Twitch Neologisms
Pavel Dolin, Luc d'Hauthuille, Andrea Vattani

TL;DR
This paper addresses the challenge of understanding Twitch emotes by establishing a new sentiment analysis baseline and proposing an unsupervised embedding-based method to infer emote semantics, improving NLP performance on Twitch chat data.
Contribution
It introduces a new sentiment analysis benchmark for Twitch data and an unsupervised framework using word embeddings and k-NN to infer emote meanings and enhance classifiers.
Findings
Outperforms previous supervised sentiment benchmarks by 7.9 percentage points.
Enables auto-generation of emote dictionaries with near-supervised accuracy.
Improves sentiment classification by incorporating emote semantics from unannotated data.
Abstract
Twitch chats pose a unique problem in natural language understanding due to a large presence of neologisms, specifically emotes. There are a total of 8.06 million emotes, over 400k of which were used in the week studied. There is virtually no information on the meaning or sentiment of emotes, and with a constant influx of new emotes and drift in their frequencies, it becomes impossible to maintain an updated manually-labeled dataset. Our paper makes a two fold contribution. First we establish a new baseline for sentiment analysis on Twitch data, outperforming the previous supervised benchmark by 7.9% points. Secondly, we introduce a simple but powerful unsupervised framework based on word embeddings and k-NN to enrich existing models with out-of-vocabulary knowledge. This framework allows us to auto-generate a pseudo-dictionary of emotes and we show that we can nearly match the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Digital Communication and Language · Topic Modeling
Methodsk-Nearest Neighbors
