Separating form and meaning: Using self-consistency to quantify task understanding across multiple senses
Xenia Ohmer, Elia Bruni, Dieuwke Hupkes

TL;DR
This paper introduces a novel evaluation method for large language models that assesses their understanding through multilingual self-consistency across different senses, revealing current limitations in language independence.
Contribution
The paper proposes a new paradigm for evaluating LLM understanding based on consistency across senses, demonstrated through multilingual testing without requiring multilingual corpora.
Findings
Multilingual consistency in ChatGPT is currently limited.
LLMs' task and world understanding are not fully language-independent.
The approach is easily extendable to other languages and tasks.
Abstract
At the staggering pace with which the capabilities of large language models (LLMs) are increasing, creating future-proof evaluation sets to assess their understanding becomes more and more challenging. In this paper, we propose a novel paradigm for evaluating LLMs which leverages the idea that correct world understanding should be consistent across different (Fregean) senses of the same meaning. Accordingly, we measure understanding not in terms of correctness but by evaluating consistency across multiple senses that are generated by the model itself. We showcase our approach by instantiating a test where the different senses are different languages, hence using multilingual self-consistency as a litmus test for the model's understanding and simultaneously addressing the important topic of multilinguality. Taking one of the latest versions of ChatGPT as our object of study, we evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsTest
