MetricBERT: Text Representation Learning via Self-Supervised Triplet Training
Itzik Malkiel, Dvir Ginzburg, Oren Barkan, Avi Caciularu, Yoni Weill,, Noam Koenigstein

TL;DR
MetricBERT is a novel BERT-based model that learns text embeddings aligned with a specific similarity metric, improving recommendation tasks and outperforming existing methods through self-supervised triplet training.
Contribution
It introduces a self-supervised triplet training approach for BERT that explicitly optimizes for a similarity metric, along with a new dataset of video game descriptions with similarity annotations.
Findings
MetricBERT outperforms state-of-the-art models in similarity tasks.
Self-supervised triplet training improves embedding quality.
The method is highly effective over traditional contrastive and cosine similarity objectives.
Abstract
We present MetricBERT, a BERT-based model that learns to embed text under a well-defined similarity metric while simultaneously adhering to the ``traditional'' masked-language task. We focus on downstream tasks of learning similarities for recommendations where we show that MetricBERT outperforms state-of-the-art alternatives, sometimes by a substantial margin. We conduct extensive evaluations of our method and its different variants, showing that our training objective is highly beneficial over a traditional contrastive loss, a standard cosine similarity objective, and six other baselines. As an additional contribution, we publish a dataset of video games descriptions along with a test set of similarity annotations crafted by a domain expert.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsTest
