Learning to Optimize Feedback for One Million Students: Insights from Multi-Armed and Contextual Bandits in Large-Scale Online Tutoring

Robin Schmucker; Nimish Pachapurkar; Shanmuga Bala; Miral Shah; Tom Mitchell

arXiv:2508.00270·cs.LG·August 4, 2025

Learning to Optimize Feedback for One Million Students: Insights from Multi-Armed and Contextual Bandits in Large-Scale Online Tutoring

Robin Schmucker, Nimish Pachapurkar, Shanmuga Bala, Miral Shah, Tom Mitchell

PDF

Open Access

TL;DR

This paper develops and evaluates a large-scale online tutoring system that uses multi-armed and contextual bandit algorithms to personalize feedback, significantly improving student learning outcomes across one million students.

Contribution

It introduces a scalable approach combining MAB and CB algorithms for personalized feedback in online tutoring, with extensive real-world evaluation on one million students.

Findings

01

MAB policies improved overall student outcomes.

02

CB policies showed limited additional benefit over MAB.

03

Data-driven policies support thousands of students daily.

Abstract

We present an online tutoring system that learns to provide effective feedback to students after they answer questions incorrectly. Using data from one million students, the system learns which assistance action (e.g., one of multiple hints) to provide for each question to optimize student learning. Employing the multi-armed bandit (MAB) framework and offline policy evaluation, we assess 43,000 assistance actions, and identify trade-offs between assistance policies optimized for different student outcomes (e.g., response correctness, session completion). We design an algorithm that for each question decides on a suitable policy training objective to enhance students' immediate second attempt success and overall practice session performance. We evaluate the resulting MAB policies in 166,000 practice sessions, verifying significant improvements in student outcomes. While MAB policies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Intelligent Tutoring Systems and Adaptive Learning · Online Learning and Analytics