NEZHA: Neural Contextualized Representation for Chinese Language   Understanding

Junqiu Wei; Xiaozhe Ren; Xiaoguang Li; Wenyong Huang; Yi Liao; Yasheng; Wang; Jiashu Lin; Xin Jiang; Xiao Chen; Qun Liu

arXiv:1909.00204·cs.CL·November 22, 2021·86 cites

NEZHA: Neural Contextualized Representation for Chinese Language Understanding

Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng, Wang, Jiashu Lin, Xin Jiang, Xiao Chen, Qun Liu

PDF

Open Access 5 Repos 4 Models

TL;DR

NEZHA is a Chinese language understanding model based on BERT, incorporating several improvements like relative positional encoding and mixed precision training, achieving state-of-the-art results on multiple Chinese NLP tasks.

Contribution

The paper introduces NEZHA, a pre-trained Chinese language model with novel enhancements that improve performance over existing models.

Findings

01

Achieves state-of-the-art results on Chinese NER, sentence matching, sentiment analysis, and NLI tasks.

02

Incorporates effective improvements such as relative positional encoding and mixed precision training.

03

Demonstrates the effectiveness of these enhancements through extensive experiments.

Abstract

The pre-trained language models have achieved great successes in various natural language understanding (NLU) tasks due to its capacity to capture the deep contextualized information in text by pre-training on large-scale corpora. In this technical report, we present our practice of pre-training language models named NEZHA (NEural contextualiZed representation for CHinese lAnguage understanding) on Chinese corpora and finetuning for the Chinese NLU tasks. The current version of NEZHA is based on BERT with a collection of proven improvements, which include Functional Relative Positional Encoding as an effective positional encoding scheme, Whole Word Masking strategy, Mixed Precision Training and the LAMB Optimizer in training the models. The experimental results show that NEZHA achieves the state-of-the-art performances when finetuned on several representative Chinese tasks, including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax