Hubness Reduction Improves Sentence-BERT Semantic Spaces

Beatrix M. G. Nielsen; Lars Kai Hansen

arXiv:2311.18364·cs.CL·December 1, 2023·1 cites

Hubness Reduction Improves Sentence-BERT Semantic Spaces

Beatrix M. G. Nielsen, Lars Kai Hansen

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that reducing hubness in Sentence-BERT semantic spaces significantly improves the quality of text embeddings, leading to better performance in semantic tasks.

Contribution

The study identifies hubness as a key issue in Sentence-BERT embeddings and shows that applying combined hubness reduction methods enhances semantic space quality.

Findings

01

Hubness causes asymmetric neighborhood relations in embeddings.

02

Applying hubness reduction decreases error rates in neighborhood-based classifiers.

03

Combined hubness reduction methods can reduce hubness by about 75%.

Abstract

Semantic representations of text, i.e. representations of natural language which capture meaning by geometry, are essential for areas such as information retrieval and document grouping. High-dimensional trained dense vectors have received much attention in recent years as such representations. We investigate the structure of semantic spaces that arise from embeddings made with Sentence-BERT and find that the representations suffer from a well-known problem in high dimensions called hubness. Hubness results in asymmetric neighborhood relations, such that some texts (the hubs) are neighbours of many other texts while most texts (so-called anti-hubs), are neighbours of few or no other texts. We quantify the semantic quality of the embeddings using hubness scores and error rate of a neighbourhood based classifier. We find that when hubness is high, we can reduce error rate and hubness…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bemigini/hubness-reduction-improves-sbert-semantic-spaces
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies