A Distribution-Based Threshold for Determining Sentence Similarity

Gioele Cadamuro; Marco Gruppo

arXiv:2311.16675·cs.CL·November 29, 2023·1 cites

A Distribution-Based Threshold for Determining Sentence Similarity

Gioele Cadamuro, Marco Gruppo

PDF

Open Access

TL;DR

This paper introduces a neural network-based method to determine a distribution-based threshold for sentence similarity, especially for sentences with highly specific information, and demonstrates its transferability across domains.

Contribution

The authors propose a novel thresholding approach using distribution analysis of sentence pair distances with a siamese neural network, improving similarity detection accuracy.

Findings

01

Effective threshold derived from distance distributions

02

Method generalizes well to different datasets

03

Improves accuracy in identifying similar sentences with specific info

Abstract

We hereby present a solution to a semantic textual similarity (STS) problem in which it is necessary to match two sentences containing, as the only distinguishing factor, highly specific information (such as names, addresses, identification codes), and from which we need to derive a definition for when they are similar and when they are not. The solution revolves around the use of a neural network, based on the siamese architecture, to create the distributions of the distances between similar and dissimilar pairs of sentences. The goal of these distributions is to find a discriminating factor, that we call "threshold", which represents a well-defined quantity that can be used to distinguish vector distances of similar pairs from vector distances of dissimilar pairs in new predictions and later analyses. In addition, we developed a way to score the predictions by combining attributes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques