C-STS: Conditional Semantic Textual Similarity
Ameet Deshpande, Carlos E. Jimenez, Howard Chen, Vishvak Murahari,, Victoria Graf, Tanmay Rajpurohit, Ashwin Kalyan, Danqi Chen, Karthik, Narasimhan

TL;DR
This paper introduces Conditional STS (C-STS), a new task that measures sentence similarity based on specified natural language conditions, reducing ambiguity and enabling detailed model evaluation.
Contribution
It proposes the C-STS task, creates a large dataset, and evaluates state-of-the-art models, highlighting challenges and advancing semantic similarity assessment.
Findings
State-of-the-art models perform poorly on C-STS with correlations below 50
C-STS reduces subjectivity in semantic similarity measurement
Provides a large dataset and code for future research
Abstract
Semantic textual similarity (STS), a cornerstone task in NLP, measures the degree of similarity between a pair of sentences, and has broad application in fields such as information retrieval and natural language understanding. However, sentence similarity can be inherently ambiguous, depending on the specific aspect of interest. We resolve this ambiguity by proposing a novel task called Conditional STS (C-STS) which measures sentences' similarity conditioned on an feature described in natural language (hereon, condition). As an example, the similarity between the sentences "The NBA player shoots a three-pointer." and "A man throws a tennis ball into the air to serve." is higher for the condition "The motion of the ball" (both upward) and lower for "The size of the ball" (one large and one small). C-STS's advantages are two-fold: (1) it reduces the subjectivity and ambiguity of STS and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
MethodsFlan-T5 · SimCSE
