Pre-training with Meta Learning for Chinese Word Segmentation
Zhen Ke, Liang Shi, Songtao Sun, Erli Meng, Bin Wang, Xipeng Qiu

TL;DR
This paper introduces METASEG, a pre-trained model for Chinese Word Segmentation that incorporates meta learning and multi-criteria pre-training to improve performance, especially in low-resource scenarios.
Contribution
It proposes a novel CWS-specific pre-trained model using meta learning to integrate prior segmentation knowledge and reduce pre-training and downstream task discrepancy.
Findings
Achieves state-of-the-art results on twelve CWS datasets.
Significantly improves performance in low-resource settings.
Utilizes multi-criteria pre-training to incorporate prior knowledge.
Abstract
Recent researches show that pre-trained models (PTMs) are beneficial to Chinese Word Segmentation (CWS). However, PTMs used in previous works usually adopt language modeling as pre-training tasks, lacking task-specific prior segmentation knowledge and ignoring the discrepancy between pre-training tasks and downstream CWS tasks. In this paper, we propose a CWS-specific pre-trained model METASEG, which employs a unified architecture and incorporates meta learning algorithm into a multi-criteria pre-training task. Empirical results show that METASEG could utilize common prior segmentation knowledge from different existing criteria and alleviate the discrepancy between pre-trained models and downstream CWS tasks. Besides, METASEG can achieve new state-of-the-art performance on twelve widely-used CWS datasets and significantly improve model performance in low-resource settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Adam · Softmax · Layer Normalization · Dense Connections · Multi-Head Attention · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Attention Dropout
