LLMs' Reading Comprehension Is Affected by Parametric Knowledge and Struggles with Hypothetical Statements
Victoria Basmov, Yoav Goldberg, Reut Tsarfaty

TL;DR
This paper evaluates large language models' reading comprehension abilities using fictitious data to eliminate biases from their internal knowledge, revealing challenges in understanding hypothetical and modal contexts.
Contribution
It introduces a novel approach using imaginary data for unbiased assessment of LLMs' linguistic understanding, highlighting their struggles with hypothetical and modal reasoning.
Findings
LLMs perform well on simple affirmative and negative contexts.
Models struggle with modal and conditional contexts.
Knowledge conflicts affect model responses in complex scenarios.
Abstract
The task of reading comprehension (RC), often implemented as context-based question answering (QA), provides a primary means to assess language models' natural language understanding (NLU) capabilities. Yet, when applied to large language models (LLMs) with extensive built-in world knowledge, this method can be deceptive. If the context aligns with the LLMs' internal knowledge, it is hard to discern whether the models' answers stem from context comprehension or from LLMs' internal information. Conversely, using data that conflicts with the models' knowledge creates erroneous trends which distort the results. To address this issue, we suggest to use RC on imaginary data, based on fictitious facts and entities. This task is entirely independent of the models' world knowledge, enabling us to evaluate LLMs' linguistic abilities without the interference of parametric knowledge. Testing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Dense Connections · Label Smoothing · Residual Connection · Multi-Head Attention · Adam · Dropout · Softmax
