Linguistically Conditioned Semantic Textual Similarity

Jingxuan Tu; Keer Xu; Liulu Yue; Bingyang Ye; Kyeongmin; Rim; James Pustejovsky

arXiv:2406.03673·cs.CL·June 7, 2024

Linguistically Conditioned Semantic Textual Similarity

Jingxuan Tu, Keer Xu, Liulu Yue, Bingyang Ye, Kyeongmin, Rim, James Pustejovsky

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper addresses issues in the Conditional Semantic Textual Similarity dataset by reannotating it, analyzing annotation discrepancies, and proposing improved methods leveraging question-answering models and linguistic features to enhance C-STS evaluation.

Contribution

The paper identifies annotation issues in C-STS datasets, introduces a reannotation process, and proposes a new model training approach using QA-generated answers and typed-feature structures.

Findings

01

Reannotation revealed 55% annotator discrepancy due to label errors and unclear conditions.

02

QA-based answer generation improves model performance on C-STS.

03

An error identification pipeline achieves over 80% F1 score in detecting annotation errors.

Abstract

Semantic textual similarity (STS) is a fundamental NLP task that measures the semantic similarity between a pair of sentences. In order to reduce the inherent ambiguity posed from the sentences, a recent work called Conditional STS (C-STS) has been proposed to measure the sentences' similarity conditioned on a certain aspect. Despite the popularity of C-STS, we find that the current C-STS dataset suffers from various issues that could impede proper evaluation on this task. In this paper, we reannotate the C-STS validation set and observe an annotator discrepancy on 55% of the instances resulting from the annotation errors in the original label, ill-defined conditions, and the lack of clarity in the task definition. After a thorough dataset analysis, we improve the C-STS task by leveraging the models' capability to understand the conditions under a QA task setting. With the generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

brandeis-llc/L-CSTS
noneOfficial

Videos

Linguistically Conditioned Semantic Textual Similarity· underline

Taxonomy

TopicsNatural Language Processing Techniques

MethodsSparse Evolutionary Training