A Hybrid Method for Low-Resource Named Entity Recognition
Do Minh Duc, Quan Xuan Truong, Viet Tran Hong, Le Hoang Anh, Mac Thi Minh Tra, Nguyen Van Thuy, Le Hai Ha, Vinh Nguyen Van

TL;DR
This paper introduces a hybrid neurosymbolic framework combining rule-based and deep learning methods, enhanced by LLM-based data augmentation, to improve Vietnamese NER in low-resource domains, achieving significant performance gains.
Contribution
It presents a novel scalable data augmentation strategy using LLMs and a hybrid pipeline that effectively handles label complexity and resource scarcity in Vietnamese NER.
Findings
Achieved up to 90% F1 score in Customer Service NER
Improved NER performance across five domain-specific datasets
Demonstrated effectiveness of hybrid approach in low-resource settings
Abstract
Named Entity Recognition (NER) is a critical component of Natural Language Processing with diverse applications in information extraction and conversational AI. However, NER in specific domains for low-resource languages faces challenges such as limited annotated data and heterogeneous label sets. This study addresses these issues by proposing a hybrid neurosymbolic framework that integrates rule-based processing with deep learning models for Vietnamese NER. The core idea involves a two-stage pipeline: first, a rule-based component reduces label complexity by grouping relational and special categories; second, pre-trained language models are fine-tuned for high-precision extraction. A post-processing module is then utilized to restore fine-grained labels, preserving expressiveness for application-level usability. To mitigate data scarcity, a scalable data augmentation strategy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
