CRMSP: A Semi-supervised Approach for Key Information Extraction with Class-Rebalancing and Merged Semantic Pseudo-Labeling
Qi Zhang, Yonghong Song, Pengcheng Guo, Yangyang Hui

TL;DR
This paper introduces CRMSP, a semi-supervised method for Key Information Extraction that rebalances pseudo-labels and clusters tail features, significantly improving performance on benchmark datasets.
Contribution
The paper presents a novel semi-supervised approach with class-rebalancing and semantic pseudo-labeling modules, addressing tail class challenges in KIE.
Findings
Achieves state-of-the-art results on three benchmarks.
Improves F1-score by 3.24% on CORD dataset.
Effectively balances tail class pseudo-labels.
Abstract
There is a growing demand in the field of KIE (Key Information Extraction) to apply semi-supervised learning to save manpower and costs, as training document data using fully-supervised methods requires labor-intensive manual annotation. The main challenges of applying SSL in the KIE are (1) underestimation of the confidence of tail classes in the long-tailed distribution and (2) difficulty in achieving intra-class compactness and inter-class separability of tail features. To address these challenges, we propose a novel semi-supervised approach for KIE with Class-Rebalancing and Merged Semantic Pseudo-Labeling (CRMSP). Firstly, the Class-Rebalancing Pseudo-Labeling (CRP) module introduces a reweighting factor to rebalance pseudo-labels, increasing attention to tail classes. Secondly, we propose the Merged Semantic Pseudo-Labeling (MSP) module to cluster tail features of unlabeled data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
