Separating form and meaning: Using self-consistency to quantify task   understanding across multiple senses

Xenia Ohmer; Elia Bruni; Dieuwke Hupkes

arXiv:2305.11662·cs.CL·December 21, 2023·6 cites

Separating form and meaning: Using self-consistency to quantify task understanding across multiple senses

Xenia Ohmer, Elia Bruni, Dieuwke Hupkes

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel evaluation method for large language models that assesses their understanding through multilingual self-consistency across different senses, revealing current limitations in language independence.

Contribution

The paper proposes a new paradigm for evaluating LLM understanding based on consistency across senses, demonstrated through multilingual testing without requiring multilingual corpora.

Findings

01

Multilingual consistency in ChatGPT is currently limited.

02

LLMs' task and world understanding are not fully language-independent.

03

The approach is easily extendable to other languages and tasks.

Abstract

At the staggering pace with which the capabilities of large language models (LLMs) are increasing, creating future-proof evaluation sets to assess their understanding becomes more and more challenging. In this paper, we propose a novel paradigm for evaluating LLMs which leverages the idea that correct world understanding should be consistent across different (Fregean) senses of the same meaning. Accordingly, we measure understanding not in terms of correctness but by evaluating consistency across multiple senses that are generated by the model itself. We showcase our approach by instantiating a test where the different senses are different languages, hence using multilingual self-consistency as a litmus test for the model's understanding and simultaneously addressing the important topic of multilinguality. Taking one of the latest versions of ChatGPT as our object of study, we evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xeniaohmer/multisense_consistency
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsTest