Contextualized Semantic Distance between Highly Overlapped Texts

Letian Peng; Zuchao Li; Hai Zhao

arXiv:2110.01176·cs.CL·June 14, 2023·1 cites

Contextualized Semantic Distance between Highly Overlapped Texts

Letian Peng, Zuchao Li, Hai Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces NDD, a novel semantic distance metric using masked language modeling to better evaluate highly overlapped texts, improving sensitivity and domain adaptation in NLP tasks.

Contribution

It proposes a mask-and-predict strategy with NDD, addressing limitations of traditional metrics in overlapped texts and enabling unsupervised text compression and domain adaptation.

Findings

01

NDD outperforms traditional metrics in semantic similarity tasks.

02

The method improves text compression without training.

03

NDD surpasses supervised methods in domain adaptation.

Abstract

Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation. Better evaluation of the semantic distance between the overlapped sentences benefits the language system's understanding and guides the generation. Since conventional semantic metrics are based on word representations, they are vulnerable to the disturbance of overlapped components with similar representations. This paper aims to address the issue with a mask-and-predict strategy. We take the words in the longest common sequence (LCS) as neighboring words and use masked language modeling (MLM) from pre-trained language models (PLMs) to predict the distributions on their positions. Our metric, Neighboring Distribution Divergence (NDD), represent the semantic distance by calculating the divergence between distributions in the overlapped parts.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Stareru/NeighboringDistributionDivergence
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification