Pcc-tuning: Breaking the Contrastive Learning Ceiling in Semantic   Textual Similarity

Bowen Zhang; Chunping Li

arXiv:2406.09790·cs.CL·October 8, 2024

Pcc-tuning: Breaking the Contrastive Learning Ceiling in Semantic Textual Similarity

Bowen Zhang, Chunping Li

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces Pcc-tuning, a novel method using Pearson's correlation as a loss function, to surpass the existing performance ceiling of contrastive learning in semantic textual similarity tasks.

Contribution

Pcc-tuning is the first approach to effectively break the contrastive learning performance ceiling by optimizing Pearson's correlation in sentence embeddings.

Findings

01

Pcc-tuning exceeds previous state-of-the-art scores in STS benchmarks.

02

It requires only a small amount of annotated data.

03

The method achieves significant improvements over contrastive learning approaches.

Abstract

Semantic Textual Similarity (STS) constitutes a critical research direction in computational linguistics and serves as a key indicator of the encoding capabilities of embedding models. Driven by advances in pre-trained language models and contrastive learning, leading sentence representation methods have reached an average Spearman's correlation score of approximately 86 across seven STS benchmarks in SentEval. However, further progress has become increasingly marginal, with no existing method attaining an average score higher than 86.5 on these tasks. This paper conducts an in-depth analysis of this phenomenon and concludes that the upper limit for Spearman's correlation scores under contrastive learning is 87.5. To transcend this ceiling, we propose an innovative approach termed Pcc-tuning, which employs Pearson's correlation coefficient as a loss function to refine model performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Pcc-tuning: Breaking the Contrastive Learning Ceiling in Semantic Textual Similarity· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsContrastive Learning