Explainable Semantic Textual Similarity via Dissimilar Span Detection

Diego Miguel Lozano; Daryna Dementieva; Alexander Fraser

arXiv:2603.21174·cs.CL·May 15, 2026

Explainable Semantic Textual Similarity via Dissimilar Span Detection

Diego Miguel Lozano, Daryna Dementieva, Alexander Fraser

PDF

TL;DR

This paper introduces Dissimilar Span Detection to improve interpretability in Semantic Textual Similarity by identifying differing spans, supported by a new dataset and baseline methods, with potential to enhance downstream NLP tasks.

Contribution

The paper defines Dissimilar Span Detection, creates a new dataset, and evaluates baseline methods, advancing interpretability in semantic similarity analysis.

Findings

01

LLMs and supervised models perform best but still have low overall accuracy.

02

Dissimilar Span Detection can improve paraphrase detection performance.

03

A new dataset (SSD) was developed for the task.

Abstract

Semantic Textual Similarity (STS) is a crucial component of many Natural Language Processing (NLP) applications. However, existing approaches typically reduce semantic nuances to a single score, limiting interpretability. To address this, we introduce the task of Dissimilar Span Detection (DSD), which aims to identify semantically differing spans between pairs of texts. This can help users understand which particular words or tokens negatively affect the similarity score, or be used to improve performance in STS-dependent downstream tasks. Furthermore, we release a new dataset suitable for the task, the Span Similarity Dataset (SSD), developed through a semi-automated pipeline combining large language models (LLMs) with human verification. We propose and evaluate different baseline methods for DSD, both unsupervised, based on LIME, SHAP, LLMs, and our own method, as well as an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.