Revisiting and Advancing Chinese Natural Language Understanding with Accelerated Heterogeneous Knowledge Pre-training
Taolin Zhang, Junwei Dong, Jianing Wang, Chengyu Wang, Ang Wang,, Yinghui Liu, Jun Huang, Yong Li, Xiaofeng He

TL;DR
This paper introduces CKBERT, a series of Chinese knowledge-enhanced BERT models that incorporate relational and linguistic knowledge through novel pre-training tasks, achieving superior performance on Chinese NLP benchmarks.
Contribution
The paper presents novel pre-training tasks and an efficient training framework for Chinese KEPLMs, resulting in multiple model sizes that outperform existing baselines.
Findings
CKBERT models outperform baselines on Chinese NLP tasks
Effective injection of relational and linguistic knowledge improves performance
Efficient pre-training with TorchAccelerator enables large-scale model training
Abstract
Recently, knowledge-enhanced pre-trained language models (KEPLMs) improve context-aware representations via learning from structured relations in knowledge graphs, and/or linguistic knowledge from syntactic or dependency analysis. Unlike English, there is a lack of high-performing open-source Chinese KEPLMs in the natural language processing (NLP) community to support various language understanding applications. In this paper, we revisit and advance the development of Chinese natural language understanding with a series of novel Chinese KEPLMs released in various parameter sizes, namely CKBERT (Chinese knowledge-enhanced BERT).Specifically, both relational and linguistic knowledge is effectively injected into CKBERT based on two novel pre-training tasks, i.e., linguistic-aware masked language modeling and contrastive multi-hop relation modeling. Based on the above two pre-training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsBalanced Selection
