Evaluating the Meta- and Object-Level Reasoning of Large Language Models for Question Answering
Nick Ferguson, Liane Guillou, Alan Bundy, Kwabena Nuamah

TL;DR
This paper evaluates large language models' ability to perform complex question answering tasks involving meta- and object-level reasoning, revealing strengths in strategic reasoning but challenges in detailed, lower-level reasoning.
Contribution
Introduces the Franklin dataset to assess meta- and object-level reasoning in LLMs and provides a comprehensive evaluation of their reasoning capabilities.
Findings
LLMs frequently demonstrate meta-level reasoning
LLMs struggle with object-level reasoning in some datasets
LLMs perform well on meta-level reasoning in the Franklin dataset
Abstract
Large Language Models (LLMs) excel in natural language tasks but still face challenges in Question Answering (QA) tasks requiring complex, multi-step reasoning. We outline the types of reasoning required in some of these tasks, and reframe them in terms of meta-level reasoning (akin to high-level strategic reasoning or planning) and object-level reasoning (embodied in lower-level tasks such as mathematical reasoning). Franklin, a novel dataset with requirements of meta- and object-level reasoning, is introduced and used along with three other datasets to evaluate four LLMs at question answering tasks requiring multiple steps of reasoning. Results from human annotation studies suggest LLMs demonstrate meta-level reasoning with high frequency, but struggle with object-level reasoning tasks in some of the datasets used. Additionally, evidence suggests that LLMs find the object-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
