TL;DR
This paper introduces SHAPE, a benchmark and method for improving safety and helpfulness in educational LLMs by formalizing behaviors and using a graph-augmented tutoring pipeline.
Contribution
It unifies safety, helpfulness, and pedagogy into a formal framework and proposes a new pipeline that enhances safety without sacrificing helpfulness.
Findings
Significantly improved safety under pedagogical jailbreaks.
Maintains near-ceiling helpfulness in evaluations.
Provides a new benchmark with 9,087 student-question pairs.
Abstract
Large Language Models (LLMs) have been widely explored in educational scenarios. We identify a critical vulnerability in current educational LLMs, pedagogical jailbreaks, where students use answer-inducing prompts to elicit solutions rather than scaffolded instructions. To enable systematic study, we unify and formalize safe, helpful, and pedagogical behaviors with a knowledge-mastery graph and introduce SHAPE, a benchmark of 9,087 student-question pairs for evaluating tutoring behavior under adversarial pressure. We propose a graph-augmented tutoring pipeline that infers prerequisite concepts from queries, identifies mastery gaps, and routes generation between instructing and problem-solving via explicit gating. Experiments across multiple LLMs show that our method yields significantly improved safety under two pedagogical jailbreak settings, while maintaining near-ceiling helpfulness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
