Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Nils Reimers; Iryna Gurevych

arXiv:1908.10084·cs.CL·August 28, 2019·88 cites

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Nils Reimers, Iryna Gurevych

PDF

Open Access 5 Repos 10 Models 3 Datasets

TL;DR

Sentence-BERT (SBERT) modifies BERT with siamese and triplet networks to generate efficient, high-quality sentence embeddings suitable for semantic similarity tasks, drastically reducing computation time while maintaining accuracy.

Contribution

The paper introduces SBERT, a novel BERT-based architecture that produces semantically meaningful sentence embeddings suitable for fast similarity search.

Findings

01

SBERT reduces similarity computation time from 65 hours to 5 seconds.

02

SBERT outperforms existing sentence embedding methods on STS and transfer learning tasks.

03

SBERT maintains BERT-level accuracy while enabling efficient semantic similarity search.

Abstract

BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new state-of-the-art performance on sentence-pair regression tasks like semantic textual similarity (STS). However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10,000 sentences requires about 50 million inference computations (~65 hours) with BERT. The construction of BERT makes it unsuitable for semantic similarity search as well as for unsupervised tasks like clustering. In this publication, we present Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65 hours with BERT / RoBERTa to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Sentiment Analysis and Opinion Mining

MethodsSentence-BERT · Linear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · RoBERTa · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam