MalAlgoQA: Pedagogical Evaluation of Counterfactual Reasoning in Large Language Models and Implications for AI in Education
Naiming Liu, Shashank Sonkar, Myco Le, Richard Baraniuk

TL;DR
MalAlgoQA introduces a dataset to evaluate large language models' ability to understand flawed reasoning in educational contexts, revealing challenges in counterfactual reasoning and the impact of prompting techniques.
Contribution
The paper presents MalAlgoQA, a new dataset and evaluation framework for assessing counterfactual reasoning in LLMs, highlighting limitations and effects of prompting methods.
Findings
State-of-the-art LLMs perform worse on malgorithm identification than on correct rationale identification.
Chain-of-thought prompting does not consistently improve counterfactual reasoning performance.
Results have implications for AI tutoring systems and addressing student misconceptions.
Abstract
This paper introduces MalAlgoQA, a novel dataset designed to evaluate the counterfactual reasoning capabilities of Large Language Models (LLMs) through a pedagogical approach. The dataset comprises mathematics and reading comprehension questions, each accompanied by four answer choices and their corresponding rationales. At the heart of MalAlgoQA are ``malgorithms'' - rationales behind incorrect answer choices that represent flawed yet logically coherent reasoning paths. These malgorithms serve as counterfactual scenarios, allowing us to assess an LLM's ability to identify and analyze flawed reasoning patterns. We propose the Malgorithm Identification task, where LLMs are assessed based on their ability to identify corresponding malgorithm given an incorrect answer choice. To evaluate the model performance, we introduce two metrics: Algorithm Identification Accuracy (AIA) for correct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
MethodsFocus
