Structured abbreviation expansion in context
Kyle Gorman, Christo Kirov, Brian Roark, and Richard Sproat

TL;DR
This paper addresses the challenge of reversing ad hoc abbreviations in informal text to recover their full forms, introducing a new dataset and baseline methods for abbreviation expansion.
Contribution
It presents a large open-source dataset of ad hoc abbreviations and develops two strong baseline models for context-aware abbreviation expansion.
Findings
Created a comprehensive dataset of ad hoc abbreviations
Analyzed abbreviation strategies in informal communication
Developed effective baseline models for abbreviation expansion
Abstract
Ad hoc abbreviations are commonly found in informal communication channels that favor shorter messages. We consider the task of reversing these abbreviations in context to recover normalized, expanded versions of abbreviated messages. The problem is related to, but distinct from, spelling correction, in that ad hoc abbreviations are intentional and may involve substantial differences from the original words. Ad hoc abbreviations are productively generated on-the-fly, so they cannot be resolved solely by dictionary lookup. We generate a large, open-source data set of ad hoc abbreviations. This data is used to study abbreviation strategies and to develop two strong baselines for abbreviation expansion
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsHigh-Order Consensuses
