"Is Whole Word Masking Always Better for Chinese BERT?": Probing on   Chinese Grammatical Error Correction

Yong Dai; Linyang Li; Cong Zhou; Zhangyin Feng; Enbo Zhao; Xipeng Qiu,; Piji Li; Duyu Tang

arXiv:2203.00286·cs.CL·March 3, 2022·1 cites

"Is Whole Word Masking Always Better for Chinese BERT?": Probing on Chinese Grammatical Error Correction

Yong Dai, Linyang Li, Cong Zhou, Zhangyin Feng, Enbo Zhao, Xipeng Qiu,, Piji Li, Duyu Tang

PDF

Open Access

TL;DR

This study investigates whether whole word masking improves Chinese BERT's understanding, especially for grammatical error correction, revealing that character-level masking is better for single-character edits, while WWM benefits multi-character handling.

Contribution

The paper introduces probing tasks and a dataset for Chinese grammatical error correction to compare masking strategies in BERT models.

Findings

01

CLM performs better for single-character edits.

02

WWM improves performance on multi-character edits.

03

Different masking strategies yield similar results on sentence-level tasks.

Abstract

Whole word masking (WWM), which masks all subwords corresponding to a word at once, makes a better English BERT model. For the Chinese language, however, there is no subword because each token is an atomic character. The meaning of a word in Chinese is different in that a word is a compositional unit consisting of multiple characters. Such difference motivates us to investigate whether WWM leads to better context understanding ability for Chinese BERT. To achieve this, we introduce two probing tasks related to grammatical error correction and ask pretrained models to revise or insert tokens in a masked language modeling manner. We construct a dataset including labels for 19,075 tokens in 10,448 sentences. We train three Chinese BERT models with standard character-level masking (CLM), WWM, and a combination of CLM and WWM, respectively. Our major findings are as follows: First, when one…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · WordPiece · Residual Connection · Layer Normalization · Dropout · Adam