EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLMs as Simulated Teachers

Yilin Jiang; Mingzi Zhang; Xuanyu Yin; Sheng Jin; Suyu Lu; Zuocan Ying; Zengyi Yu; Xiangjie Kong

arXiv:2511.06890·cs.CL·November 11, 2025

EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLMs as Simulated Teachers

Yilin Jiang, Mingzi Zhang, Xuanyu Yin, Sheng Jin, Suyu Lu, Zuocan Ying, Zengyi Yu, Xiangjie Kong

PDF

Open Access 1 Video

TL;DR

EduGuardBench introduces a comprehensive benchmark for evaluating the pedagogical fidelity and safety of large language models acting as simulated teachers, addressing role fidelity, harm detection, and adversarial vulnerabilities in educational contexts.

Contribution

It presents a dual-component framework with novel metrics for assessing professional fidelity and safety vulnerabilities, including the Role-playing Fidelity Score and Attack Success Rate, across 14 models.

Findings

01

Reasoning models show higher fidelity but still have significant failures.

02

Mid-sized models are unexpectedly more vulnerable to adversarial attacks.

03

Safe models effectively convert harmful prompts into educational refusals, indicating advanced safety capabilities.

Abstract

Large Language Models for Simulating Professions (SP-LLMs), particularly as teachers, are pivotal for personalized education. However, ensuring their professional competence and ethical safety is a critical challenge, as existing benchmarks fail to measure role-playing fidelity or address the unique teaching harms inherent in educational scenarios. To address this, we propose EduGuardBench, a dual-component benchmark. It assesses professional fidelity using a Role-playing Fidelity Score (RFS) while diagnosing harms specific to the teaching profession. It also probes safety vulnerabilities using persona-based adversarial prompts targeting both general harms and, particularly, academic misconduct, evaluated with metrics including Attack Success Rate (ASR) and a three-tier Refusal Quality assessment. Our extensive experiments on 14 leading models reveal a stark polarization in performance.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLMs as Simulated Teachers· underline

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling