BURT: BERT-inspired Universal Representation from Learning Meaningful   Segment

Yian Li; Hai Zhao

arXiv:2012.14320·cs.CL·January 1, 2021

BURT: BERT-inspired Universal Representation from Learning Meaningful Segment

Yian Li, Hai Zhao

PDF

Open Access

TL;DR

This paper introduces BURT, a universal language representation model that encodes multiple linguistic levels into a single vector space, improving performance across various NLP tasks and benchmarks.

Contribution

The paper proposes a novel pre-training approach that incorporates multi-level linguistic segments into a unified embedding space, enhancing cross-level language understanding.

Findings

01

Outperforms baselines on GLUE and CLUE benchmarks

02

Effective in text matching and question-answering tasks

03

Universal representations improve retrieval-based NLP applications

Abstract

Although pre-trained contextualized language models such as BERT achieve significant performance on various downstream tasks, current language representation still only focuses on linguistic objective at a specific granularity, which may not applicable when multiple levels of linguistic units are involved at the same time. Thus this work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space. We present a universal representation model, BURT (BERT-inspired Universal Representation from learning meaningful segmenT), to encode different levels of linguistic unit into the same vector space. Specifically, we extract and mask meaningful segments based on point-wise mutual information (PMI) to incorporate different granular objectives into the pre-training stage. We conduct experiments on datasets for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies

MethodsLinear Layer · Attention Is All You Need · Dropout · Adam · Multi-Head Attention · WordPiece · Residual Connection · Layer Normalization · Linear Warmup With Linear Decay · Dense Connections