A Zipf's Law-based Text Generation Approach for Addressing Imbalance in Entity Extraction
Zhenhua Wang, Ming Ren, Dong Gao, Zhuang Li

TL;DR
This paper introduces a Zipf's Law-based text generation method to address data imbalance in entity extraction, improving extraction accuracy by supplementing rare entities in technical documents.
Contribution
It proposes a novel approach that leverages Zipf's Law to classify and generate sentences, enhancing the detection of rare entities in datasets.
Findings
Experimental results show improved entity extraction accuracy.
The method effectively mitigates data imbalance issues.
Zipf's Law enhances AI progress in entity recognition.
Abstract
Entity extraction is critical in the intelligent advancement across diverse domains. Nevertheless, a challenge to its effectiveness arises from the data imbalance. This paper proposes a novel approach by viewing the issue through the quantitative information, recognizing that entities exhibit certain levels of commonality while others are scarce, which can be reflected in the quantifiable distribution of words. The Zipf's Law emerges as a well-suited adoption, and to transition from words to entities, words within the documents are classified as common and rare ones. Subsequently, sentences are classified into common and rare ones, and are further processed by text generation models accordingly. Rare entities within the generated sentences are then labeled using human-designed rules, serving as a supplement to the raw dataset, thereby mitigating the imbalance problem. The study presents…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Text and Document Classification Technologies
