BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized   Model Performance

Timo Schick; Hinrich Sch\"utze

arXiv:1910.07181·cs.CL·April 30, 2020

BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance

Timo Schick, Hinrich Sch\"utze

PDF

1 Repo

TL;DR

BERTRAM enhances pretrained language models by generating high-quality embeddings for rare words, significantly improving performance on various NLP tasks through a novel architecture that leverages surface form and context interactions.

Contribution

This work introduces BERTRAM, a new architecture that improves rare word representations in pretrained models by integrating surface form and context interactions.

Findings

01

Large performance gains on downstream NLP tasks.

02

Improved representations of rare and medium frequency words.

03

Effective integration of BERTRAM into BERT architecture.

Abstract

Pretraining deep language models has led to large performance gains in NLP. Despite this success, Schick and Sch\"utze (2020) recently showed that these models struggle to understand rare words. For static word embeddings, this problem has been addressed by separately learning representations for rare words. In this work, we transfer this idea to pretrained language models: We introduce BERTRAM, a powerful architecture based on BERT that is capable of inferring high-quality embeddings for rare words that are suitable as input representations for deep language models. This is achieved by enabling the surface form and contexts of a word to interact with each other in a deep architecture. Integrating BERTRAM into BERT leads to large performance increases due to improved representations of rare and medium frequency words on both a rare word probing task and three downstream tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

timoschick/bertram
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax