What Does This Acronym Mean? Introducing a New Dataset for Acronym Identification and Disambiguation
Amir Pouran Ben Veyseh, Franck Dernoncourt, Quan Hung Tran, Thien Huu, Nguyen

TL;DR
This paper introduces large, manually annotated datasets for acronym identification and disambiguation in the scientific domain, addressing limitations of previous datasets and demonstrating the superiority of a new deep learning model.
Contribution
The paper creates the first large-scale, manually annotated datasets for AI and AD in the scientific domain and proposes a novel deep learning model utilizing sentence syntax.
Findings
Existing models lag behind human performance on new datasets.
The proposed model outperforms state-of-the-art on the AD dataset.
New datasets enable better research and model development in acronym understanding.
Abstract
Acronyms are the short forms of phrases that facilitate conveying lengthy sentences in documents and serve as one of the mainstays of writing. Due to their importance, identifying acronyms and corresponding phrases (i.e., acronym identification (AI)) and finding the correct meaning of each acronym (i.e., acronym disambiguation (AD)) are crucial for text understanding. Despite the recent progress on this task, there are some limitations in the existing datasets which hinder further improvement. More specifically, limited size of manually annotated AI datasets or noises in the automatically created acronym identification datasets obstruct designing advanced high-performing acronym identification models. Moreover, the existing datasets are mostly limited to the medical domain and ignore other domains. In order to address these two limitations, we first create a manually annotated large AI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
