Comparative analysis of word embeddings in assessing semantic similarity   of complex sentences

Dhivya Chandrasekaran; Vijay Mago

arXiv:2010.12637·cs.CL·July 13, 2021

Comparative analysis of word embeddings in assessing semantic similarity of complex sentences

Dhivya Chandrasekaran, Vijay Mago

PDF

Open Access

TL;DR

This paper investigates how the complexity of sentences affects the performance of different word embeddings and language models in assessing semantic similarity, revealing a notable decline in accuracy with increased sentence complexity.

Contribution

It introduces a new complex sentence dataset and analyzes the sensitivity of various embeddings to sentence complexity, highlighting limitations of current models.

Findings

01

Performance drops 10-20% with increased sentence complexity

02

Existing benchmarks may overestimate model capabilities on complex sentences

03

Complexity impacts the reliability of semantic similarity assessments

Abstract

Semantic textual similarity is one of the open research challenges in the field of Natural Language Processing. Extensive research has been carried out in this field and near-perfect results are achieved by recent transformer-based models in existing benchmark datasets like the STS dataset and the SICK dataset. In this paper, we study the sentences in these datasets and analyze the sensitivity of various word embeddings with respect to the complexity of the sentences. We build a complex sentences dataset comprising of 50 sentence pairs with associated semantic similarity values provided by 15 human annotators. Readability analysis is performed to highlight the increase in complexity of the sentences in the existing benchmark datasets and those in the proposed dataset. Further, we perform a comparative analysis of the performance of various word embeddings and language models on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification