TL;DR
This paper investigates the effectiveness of informal-domain word representations by proposing a metric based on spelling variants, utilizing UrbanDictionary to evaluate whether these representations can bypass explicit text normalization.
Contribution
It introduces a novel evaluation metric for informal word representations and a method to collect spelling variant datasets from UrbanDictionary.
Findings
Proposed a metric for evaluating informal word representations.
Collected a dataset of spelling variants from UrbanDictionary.
Provided insights into the suitability of informal word embeddings for real-world tasks.
Abstract
Existing corpora for intrinsic evaluation are not targeted towards tasks in informal domains such as Twitter or news comment forums. We want to test whether a representation of informal words fulfills the promise of eliding explicit text normalization as a preprocessing step. One possible evaluation metric for such domains is the proximity of spelling variants. We propose how such a metric might be computed and how a spelling variant dataset can be collected using UrbanDictionary.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
