A Multi-Agent Approach to Validate and Refine LLM-Generated Personalized Math Problems

Fareya Ikram; Nischal Ashok Kumar; Junyang Lu; Hunter McNichols; Candace Walkington; Neil Heffernan; Andrew S. Lan

arXiv:2604.05160·cs.CY·April 8, 2026

A Multi-Agent Approach to Validate and Refine LLM-Generated Personalized Math Problems

Fareya Ikram, Nischal Ashok Kumar, Junyang Lu, Hunter McNichols, Candace Walkington, Neil Heffernan, Andrew S. Lan

PDF

TL;DR

This paper introduces a multi-agent framework that iteratively refines LLM-generated personalized math problems to improve realism, authenticity, readability, and solvability, based on validation feedback.

Contribution

It formalizes personalization as a generate-validate-revise process with specialized validators, improving problem quality for personalized education.

Findings

01

Refinement reduces realism and authenticity failures significantly.

02

Different strategies excel in different validation criteria.

03

Validator reliability varies, highest on realism, lowest on authenticity.

Abstract

Students benefit from math problems contextualized to their interests. Large language models (LLMs) offer promise for efficient personalization at scale. However, LLM-generated personalized problems may often have problems such as unrealistic quantities and contexts, poor readability, limited authenticity with respect to students' experiences, and occasional mathematical inconsistencies. To alleviate these problems, we propose a multi-agent framework that formalizes personalization as an iterative generate--validate--revise process; we use four specialized validator agents targeting the criteria of solvability, realism, readability, and authenticity, respectively. We evaluate our framework on 600 problems drawn from a popular online mathematics homework platform, ASSISTments, personalizing each problem to a fixed set of 20 student interest topics. We compare three refinement strategies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.