# Pre-Training with Whole Word Masking for Chinese BERT

**Authors:** Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang

arXiv: 1906.08101 · 2021-11-29

## TL;DR

This paper introduces whole word masking for Chinese BERT and proposes MacBERT, a new model that improves performance on multiple NLP tasks, demonstrating state-of-the-art results and open-sourcing the models.

## Contribution

It presents the novel whole word masking strategy for Chinese BERT and introduces MacBERT, an improved model with a new masking method, advancing Chinese NLP pre-training techniques.

## Key findings

- MacBERT achieves state-of-the-art results on Chinese NLP tasks.
- Whole word masking improves Chinese BERT performance.
- Open-source resources facilitate further research.

## Abstract

Bidirectional Encoder Representations from Transformers (BERT) has shown marvelous improvements across various NLP tasks, and its consecutive variants have been proposed to further improve the performance of the pre-trained language models. In this paper, we aim to first introduce the whole word masking (wwm) strategy for Chinese BERT, along with a series of Chinese pre-trained language models. Then we also propose a simple but effective model called MacBERT, which improves upon RoBERTa in several ways. Especially, we propose a new masking strategy called MLM as correction (Mac). To demonstrate the effectiveness of these models, we create a series of Chinese pre-trained language models as our baselines, including BERT, RoBERTa, ELECTRA, RBT, etc. We carried out extensive experiments on ten Chinese NLP tasks to evaluate the created Chinese pre-trained language models as well as the proposed MacBERT. Experimental results show that MacBERT could achieve state-of-the-art performances on many NLP tasks, and we also ablate details with several findings that may help future research. We open-source our pre-trained language models for further facilitating our research community. Resources are available: https://github.com/ymcui/Chinese-BERT-wwm

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.08101/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1906.08101/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/1906.08101/full.md

---
Source: https://tomesphere.com/paper/1906.08101