LICHEE: Improving Language Model Pre-training with Multi-grained   Tokenization

Weidong Guo; Mingjun Zhao; Lusheng Zhang; Di Niu; Jinwen Luo; Zhenhua; Liu; Zhenyang Li; Jianbo Tang

arXiv:2108.00801·cs.CL·August 4, 2021

LICHEE: Improving Language Model Pre-training with Multi-grained Tokenization

Weidong Guo, Mingjun Zhao, Lusheng Zhang, Di Niu, Jinwen Luo, Zhenhua, Liu, Zhenyang Li, Jianbo Tang

PDF

1 Repo

TL;DR

LICHEE is a pre-training method that enhances language models by integrating multi-grained tokenization, leading to improved performance on diverse NLU tasks with minimal additional inference cost.

Contribution

The paper introduces LICHEE, a novel pre-training approach that incorporates multi-grained tokenization to improve language model representations across languages.

Findings

01

Achieves state-of-the-art results on CLUE benchmark.

02

Improves performance on SuperGLUE tasks.

03

Effective across Chinese and English NLU tasks.

Abstract

Language model pre-training based on large corpora has achieved tremendous success in terms of constructing enriched contextual representations and has led to significant performance gains on a diverse range of Natural Language Understanding (NLU) tasks. Despite the success, most current pre-trained language models, such as BERT, are trained based on single-grained tokenization, usually with fine-grained characters or sub-words, making it hard for them to learn the precise meaning of coarse-grained words and phrases. In this paper, we propose a simple yet effective pre-training method named LICHEE to efficiently incorporate multi-grained information of input text. Our method can be applied to various pre-trained language models and improve their representation capability. Extensive experiments conducted on CLUE and SuperGLUE demonstrate that our method achieves comprehensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lbneon/research
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Dropout · Multi-Head Attention · Layer Normalization · Dense Connections · Linear Warmup With Linear Decay · Residual Connection · Softmax