A Chinese Dataset with Negative Full Forms for General Abbreviation Prediction
Yi Zhang, Xu Sun

TL;DR
This paper introduces a new Chinese dataset that includes negative full forms for abbreviation prediction, addressing a gap in existing corpora and enabling better model training for general abbreviation tasks.
Contribution
The paper creates and releases a Chinese abbreviation dataset with negative full forms, facilitating research on abbreviation prediction including non-abbreviable expressions.
Findings
Evaluated multiple models on the dataset
Dataset improves the study of negative full forms
Baseline results provided for future research
Abstract
Abbreviation is a common phenomenon across languages, especially in Chinese. In most cases, if an expression can be abbreviated, its abbreviation is used more often than its fully expanded forms, since people tend to convey information in a most concise way. For various language processing tasks, abbreviation is an obstacle to improving the performance, as the textual form of an abbreviation does not express useful information, unless it's expanded to the full form. Abbreviation prediction means associating the fully expanded forms with their abbreviations. However, due to the deficiency in the abbreviation corpora, such a task is limited in current studies, especially considering general abbreviation prediction should also include those full form expressions that do not have valid abbreviations, namely the negative full forms (NFFs). Corpora incorporating negative full forms for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Advanced Text Analysis Techniques · Topic Modeling
