Pre-training with Meta Learning for Chinese Word Segmentation

Zhen Ke; Liang Shi; Songtao Sun; Erli Meng; Bin Wang; Xipeng Qiu

arXiv:2010.12272·cs.CL·March 16, 2021·1 cites

Pre-training with Meta Learning for Chinese Word Segmentation

Zhen Ke, Liang Shi, Songtao Sun, Erli Meng, Bin Wang, Xipeng Qiu

PDF

Open Access

TL;DR

This paper introduces METASEG, a pre-trained model for Chinese Word Segmentation that incorporates meta learning and multi-criteria pre-training to improve performance, especially in low-resource scenarios.

Contribution

It proposes a novel CWS-specific pre-trained model using meta learning to integrate prior segmentation knowledge and reduce pre-training and downstream task discrepancy.

Findings

01

Achieves state-of-the-art results on twelve CWS datasets.

02

Significantly improves performance in low-resource settings.

03

Utilizes multi-criteria pre-training to incorporate prior knowledge.

Abstract

Recent researches show that pre-trained models (PTMs) are beneficial to Chinese Word Segmentation (CWS). However, PTMs used in previous works usually adopt language modeling as pre-training tasks, lacking task-specific prior segmentation knowledge and ignoring the discrepancy between pre-training tasks and downstream CWS tasks. In this paper, we propose a CWS-specific pre-trained model METASEG, which employs a unified architecture and incorporates meta learning algorithm into a multi-criteria pre-training task. Empirical results show that METASEG could utilize common prior segmentation knowledge from different existing criteria and alleviate the discrepancy between pre-trained models and downstream CWS tasks. Besides, METASEG can achieve new state-of-the-art performance on twelve widely-used CWS datasets and significantly improve model performance in low-resource settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Adam · Softmax · Layer Normalization · Dense Connections · Multi-Head Attention · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Attention Dropout