Explaining and Improving BERT Performance on Lexical Semantic Change   Detection

Severin Laicher; Sinan Kurtyigit; Dominik Schlechtweg; Jonas Kuhn,; Sabine Schulte im Walde

arXiv:2103.07259·cs.CL·March 15, 2021

Explaining and Improving BERT Performance on Lexical Semantic Change Detection

Severin Laicher, Sinan Kurtyigit, Dominik Schlechtweg, Jonas Kuhn,, Sabine Schulte im Walde

PDF

TL;DR

This paper investigates why BERT underperforms in lexical semantic change detection and demonstrates that reducing orthographic influence significantly enhances its effectiveness.

Contribution

It identifies orthographic information as a key factor limiting BERT's performance and proposes a method to mitigate this, improving lexical semantic change detection results.

Findings

01

Orthographic information affects BERT's clustering performance.

02

Reducing orthographic influence improves BERT's accuracy.

03

Type-based models outperform token-based models in this task.

Abstract

Type- and token-based embedding architectures are still competing in lexical semantic change detection. The recent success of type-based models in SemEval-2020 Task 1 has raised the question why the success of token-based models on a variety of other NLP tasks does not translate to our field. We investigate the influence of a range of variables on clusterings of BERT vectors and show that its low performance is largely due to orthographic information on the target word, which is encoded even in the higher layers of BERT representations. By reducing the influence of orthography we considerably improve BERT's performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Dropout · Attention Is All You Need · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Dense Connections · Softmax · Layer Normalization