fastHan: A BERT-based Multi-Task Toolkit for Chinese NLP
Zhichao Geng, Hang Yan, Xipeng Qiu, Xuanjing Huang

TL;DR
fastHan is a compact, multi-task BERT-based toolkit for Chinese NLP that achieves near state-of-the-art results across four core tasks and is highly adaptable for practical use.
Contribution
It introduces a multi-task, pruned BERT model for Chinese NLP, with a smaller 4-layer version, and demonstrates strong performance and transferability.
Findings
Achieves near SOTA in dependency parsing and NER.
Attains SOTA in Chinese word segmentation and POS tagging.
Exhibits strong transferability and user-friendly design.
Abstract
We present fastHan, an open-source toolkit for four basic tasks in Chinese natural language processing: Chinese word segmentation (CWS), Part-of-Speech (POS) tagging, named entity recognition (NER), and dependency parsing. The backbone of fastHan is a multi-task model based on a pruned BERT, which uses the first 8 layers in BERT. We also provide a 4-layer base model compressed from the 8-layer model. The joint-model is trained and evaluated on 13 corpora of four tasks, yielding near state-of-the-art (SOTA) performance in dependency parsing and NER, achieving SOTA performance in CWS and POS. Besides, fastHan's transferability is also strong, performing much better than popular segmentation tools on a non-training corpus. To better meet the need of practical application, we allow users to use their own labeled data to further fine-tune fastHan. In addition to its small size and excellent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Web Data Mining and Analysis
MethodsLinear Layer · Softmax · Layer Normalization · Dense Connections · Weight Decay · Dropout · Linear Warmup With Linear Decay · Attention Dropout · WordPiece · Multi-Head Attention
