Evaluating Informal-Domain Word Representations With UrbanDictionary

Naomi Saphra; Adam Lopez

arXiv:1606.08270·cs.CL·June 28, 2016

Evaluating Informal-Domain Word Representations With UrbanDictionary

Naomi Saphra, Adam Lopez

PDF

1 Repo

TL;DR

This paper investigates the effectiveness of informal-domain word representations by proposing a metric based on spelling variants, utilizing UrbanDictionary to evaluate whether these representations can bypass explicit text normalization.

Contribution

It introduces a novel evaluation metric for informal word representations and a method to collect spelling variant datasets from UrbanDictionary.

Findings

01

Proposed a metric for evaluating informal word representations.

02

Collected a dataset of spelling variants from UrbanDictionary.

03

Provided insights into the suitability of informal word embeddings for real-world tasks.

Abstract

Existing corpora for intrinsic evaluation are not targeted towards tasks in informal domains such as Twitter or news comment forums. We want to test whether a representation of informal words fulfills the promise of eliding explicit text normalization as a preprocessing step. One possible evaluation metric for such domains is the proximity of spelling variants. We propose how such a metric might be computed and how a spelling variant dataset can be collected using UrbanDictionary.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nsaphra/urbandic-scraper
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.