NEZHA: Neural Contextualized Representation for Chinese Language Understanding
Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng, Wang, Jiashu Lin, Xin Jiang, Xiao Chen, Qun Liu

TL;DR
NEZHA is a Chinese language understanding model based on BERT, incorporating several improvements like relative positional encoding and mixed precision training, achieving state-of-the-art results on multiple Chinese NLP tasks.
Contribution
The paper introduces NEZHA, a pre-trained Chinese language model with novel enhancements that improve performance over existing models.
Findings
Achieves state-of-the-art results on Chinese NER, sentence matching, sentiment analysis, and NLI tasks.
Incorporates effective improvements such as relative positional encoding and mixed precision training.
Demonstrates the effectiveness of these enhancements through extensive experiments.
Abstract
The pre-trained language models have achieved great successes in various natural language understanding (NLU) tasks due to its capacity to capture the deep contextualized information in text by pre-training on large-scale corpora. In this technical report, we present our practice of pre-training language models named NEZHA (NEural contextualiZed representation for CHinese lAnguage understanding) on Chinese corpora and finetuning for the Chinese NLU tasks. The current version of NEZHA is based on BERT with a collection of proven improvements, which include Functional Relative Positional Encoding as an effective positional encoding scheme, Whole Word Masking strategy, Mixed Precision Training and the LAMB Optimizer in training the models. The experimental results show that NEZHA achieves the state-of-the-art performances when finetuned on several representative Chinese tasks, including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax
