VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and   Lexical Alterations

Sri Harsha Dumpala; Aman Jaiswal; Chandramouli Sastry; Evangelos; Milios; Sageev Oore; Hassan Sajjad

arXiv:2404.16365·cs.CL·April 26, 2024

VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and Lexical Alterations

Sri Harsha Dumpala, Aman Jaiswal, Chandramouli Sastry, Evangelos, Milios, Sageev Oore, Hassan Sajjad

PDF

Open Access

TL;DR

The VISLA benchmark evaluates how well vision-language and unimodal language models understand semantic and lexical nuances, revealing their sensitivities and limitations without requiring fine-tuning.

Contribution

This paper introduces the VISLA benchmark, unifying image-to-text and text-to-text retrieval tasks for off-the-shelf evaluation of semantic and lexical understanding in models.

Findings

01

VLMs show greater sensitivity to semantic and lexical variations than ULMs.

02

Models struggle to distinguish between lexical and semantic differences.

03

Spatial semantics are highly sensitive to lexical information.

Abstract

Despite their remarkable successes, state-of-the-art language models face challenges in grasping certain important semantic details. This paper introduces the VISLA (Variance and Invariance to Semantic and Lexical Alterations) benchmark, designed to evaluate the semantic and lexical understanding of language models. VISLA presents a 3-way semantic (in)equivalence task with a triplet of sentences associated with an image, to evaluate both vision-language models (VLMs) and unimodal language models (ULMs). An evaluation involving 34 VLMs and 20 ULMs reveals surprising difficulties in distinguishing between lexical and semantic variations. Spatial semantics encoded by language models also appear to be highly sensitive to lexical information. Notably, text encoders of VLMs demonstrate greater sensitivity to semantic and lexical variations than unimodal text encoders. Our contributions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification