A Chinese Dataset with Negative Full Forms for General Abbreviation   Prediction

Yi Zhang; Xu Sun

arXiv:1712.06289·cs.CL·December 19, 2017

A Chinese Dataset with Negative Full Forms for General Abbreviation Prediction

Yi Zhang, Xu Sun

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new Chinese dataset that includes negative full forms for abbreviation prediction, addressing a gap in existing corpora and enabling better model training for general abbreviation tasks.

Contribution

The paper creates and releases a Chinese abbreviation dataset with negative full forms, facilitating research on abbreviation prediction including non-abbreviable expressions.

Findings

01

Evaluated multiple models on the dataset

02

Dataset improves the study of negative full forms

03

Baseline results provided for future research

Abstract

Abbreviation is a common phenomenon across languages, especially in Chinese. In most cases, if an expression can be abbreviated, its abbreviation is used more often than its fully expanded forms, since people tend to convey information in a most concise way. For various language processing tasks, abbreviation is an obstacle to improving the performance, as the textual form of an abbreviation does not express useful information, unless it's expanded to the full form. Abbreviation prediction means associating the fully expanded forms with their abbreviations. However, due to the deficiency in the abbreviation corpora, such a task is limited in current studies, especially considering general abbreviation prediction should also include those full form expressions that do not have valid abbreviations, namely the negative full forms (NFFs). Corpora incorporating negative full forms for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lancopku/Chinese-abbreviation-dataset
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Advanced Text Analysis Techniques · Topic Modeling