Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of   Large Language Models with Misconceptions

Naiming Liu; Shashank Sonkar; Zichao Wang; Simon Woodhead; Richard G.; Baraniuk

arXiv:2310.02439·cs.CL·October 5, 2023·2 cites

Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions

Naiming Liu, Shashank Sonkar, Zichao Wang, Simon Woodhead, Richard G., Baraniuk

PDF

Open Access

TL;DR

This paper introduces a novel evaluation method for large language models' math reasoning by simulating novice learners and expert tutors to identify misconceptions behind incorrect answers, revealing limitations in current models.

Contribution

It proposes a new educational-inspired evaluation framework for LLMs to assess their ability to mimic misconceptions and identify errors in math reasoning.

Findings

01

LLMs can answer simple math questions correctly.

02

LLMs struggle to identify misconceptions behind incorrect answers.

03

The approach opens new avenues for improving LLMs in educational contexts.

Abstract

We propose novel evaluations for mathematical reasoning capabilities of Large Language Models (LLMs) based on mathematical misconceptions. Our primary approach is to simulate LLMs as a novice learner and an expert tutor, aiming to identify the incorrect answer to math question resulted from a specific misconception and to recognize the misconception(s) behind an incorrect answer, respectively. Contrary to traditional LLMs-based mathematical evaluations that focus on answering math questions correctly, our approach takes inspirations from principles in educational learning sciences. We explicitly ask LLMs to mimic a novice learner by answering questions in a specific incorrect manner based on incomplete knowledge; and to mimic an expert tutor by identifying misconception(s) corresponding to an incorrect answer to a question. Using simple grade-school math problems, our experiments reveal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Topic Modeling · Natural Language Processing Techniques

MethodsFocus