SetLexSem Challenge: Using Set Operations to Evaluate the Lexical and   Semantic Robustness of Language Models

Bardiya Akhbari; Manish Gawali; Nicholas A. Dronen

arXiv:2411.07336·cs.CL·November 13, 2024

SetLexSem Challenge: Using Set Operations to Evaluate the Lexical and Semantic Robustness of Language Models

Bardiya Akhbari, Manish Gawali, Nicholas A. Dronen

PDF

Open Access 1 Repo 1 Video

TL;DR

The paper introduces the SetLexSem Challenge, a synthetic benchmark to evaluate the robustness of large language models in performing set operations under lexical and semantic variations, revealing significant robustness issues.

Contribution

It presents a new benchmark, SetLexSem, for systematically testing LLMs' invariance in set operations across lexical and semantic variations, highlighting their vulnerabilities.

Findings

01

LLMs show poor robustness to variations in operations and operands.

02

LLMs exhibit specific failure modes with semantic groupings of sets.

03

Measuring robustness to frequency and length variations is challenging.

Abstract

Set theory is foundational to mathematics and, when sets are finite, to reasoning about the world. An intelligent system should perform set operations consistently, regardless of superficial variations in the operands. Initially designed for semantically-oriented NLP tasks, large language models (LLMs) are now being evaluated on algorithmic tasks. Because sets are comprised of arbitrary symbols (e.g. numbers, words), they provide an opportunity to test, systematically, the invariance of LLMs' algorithmic abilities under simple lexical or semantic variations. To this end, we present the SetLexSem Challenge, a synthetic benchmark that evaluates the performance of LLMs on set operations. SetLexSem assesses the robustness of LLMs' instruction-following abilities under various conditions, focusing on the set operations and the nature and construction of the set members. Evaluating seven LLMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amazon-science/setlexsem-challenge
noneOfficial

Videos

SETLEXSEM CHALLENGE: Using Set Operations to Evaluate the Lexical and Semantic Robustness of Language Models· slideslive

Taxonomy

TopicsSemantic Web and Ontologies · Topic Modeling · Natural Language Processing Techniques

MethodsSparse Evolutionary Training