Hierarchical Pedagogical Oversight: A Multi-Agent Adversarial Framework for Reliable AI Tutoring

Saisab Sadhu; Ashim Dhor

arXiv:2512.22496·cs.MA·December 30, 2025

Hierarchical Pedagogical Oversight: A Multi-Agent Adversarial Framework for Reliable AI Tutoring

Saisab Sadhu, Ashim Dhor

PDF

Open Access

TL;DR

This paper presents Hierarchical Pedagogical Oversight (HPO), a multi-agent adversarial framework that improves the reliability of AI tutoring systems by enforcing dialectical debate among specialized agents, outperforming larger models on educational dialogue assessment.

Contribution

The paper introduces HPO, a novel multi-agent adversarial framework with a hierarchical structure that enhances pedagogical reasoning in AI tutors, reducing reliance on large models.

Findings

01

HPO achieves a Macro F1 of 0.845 on MRBench, surpassing GPT-4o.

02

HPO uses 20 times fewer parameters than GPT-4o.

03

Adversarial reasoning improves reliability in AI tutoring systems.

Abstract

Large Language Models (LLMs) are increasingly deployed as automated tutors to address educator shortages; however, they often fail at pedagogical reasoning, frequently validating incorrect student solutions (sycophancy) or providing overly direct answers that hinder learning. We introduce Hierarchical Pedagogical Oversight (HPO), a framework that adapts structured adversarial synthesis to educational assessment. Unlike cooperative multi-agent systems that often drift toward superficial consensus, HPO enforces a dialectical separation of concerns: specialist agents first distill dialogue context, which then grounds a moderated, five-act debate between opposing pedagogical critics. We evaluate this framework on the MRBench dataset of 1,214 middle-school mathematics dialogues. Our 8B-parameter model achieves a Macro F1 of 0.845, outperforming GPT-4o (0.812) by 3.3% while using 20 times…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Multimodal Machine Learning Applications