Structured abbreviation expansion in context

Kyle Gorman; Christo Kirov; Brian Roark; and Richard Sproat

arXiv:2110.01140·cs.CL·October 5, 2021

Structured abbreviation expansion in context

Kyle Gorman, Christo Kirov, Brian Roark, and Richard Sproat

PDF

TL;DR

This paper addresses the challenge of reversing ad hoc abbreviations in informal text to recover their full forms, introducing a new dataset and baseline methods for abbreviation expansion.

Contribution

It presents a large open-source dataset of ad hoc abbreviations and develops two strong baseline models for context-aware abbreviation expansion.

Findings

01

Created a comprehensive dataset of ad hoc abbreviations

02

Analyzed abbreviation strategies in informal communication

03

Developed effective baseline models for abbreviation expansion

Abstract

Ad hoc abbreviations are commonly found in informal communication channels that favor shorter messages. We consider the task of reversing these abbreviations in context to recover normalized, expanded versions of abbreviated messages. The problem is related to, but distinct from, spelling correction, in that ad hoc abbreviations are intentional and may involve substantial differences from the original words. Ad hoc abbreviations are productively generated on-the-fly, so they cannot be resolved solely by dictionary lookup. We generate a large, open-source data set of ad hoc abbreviations. This data is used to study abbreviation strategies and to develop two strong baselines for abbreviation expansion

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsHigh-Order Consensuses