ISD-Agent-Bench: A Comprehensive Benchmark for Evaluating LLM-based Instructional Design Agents

YoungHoon Jeon; Suwan Kim; Haein Son; Sookbun Lee; Yeil Jeong; Unggi Lee

arXiv:2602.10620·cs.SE·February 12, 2026

ISD-Agent-Bench: A Comprehensive Benchmark for Evaluating LLM-based Instructional Design Agents

YoungHoon Jeon, Suwan Kim, Haein Son, Sookbun Lee, Yeil Jeong, Unggi Lee

PDF

Open Access

TL;DR

This paper introduces ISD-Agent-Bench, a large-scale, standardized benchmark for evaluating LLM-based instructional design agents, highlighting the importance of classical ISD theories and diverse evaluation protocols.

Contribution

It presents a comprehensive benchmark with 25,795 scenarios, employing multi-judge evaluation and comparing classical ISD frameworks with modern reasoning approaches.

Findings

01

Classical ISD frameworks combined with ReAct reasoning outperform other agents.

02

High inter-judge reliability achieved with diverse LLMs.

03

Theoretical quality correlates with benchmark performance.

Abstract

Large Language Model (LLM) agents have shown promising potential in automating Instructional Systems Design (ISD), a systematic approach to developing educational programs. However, evaluating these agents remains challenging due to the lack of standardized benchmarks and the risk of LLM-as-judge bias. We present ISD-Agent-Bench, a comprehensive benchmark comprising 25,795 scenarios generated via a Context Matrix framework that combines 51 contextual variables across 5 categories with 33 ISD sub-steps derived from the ADDIE model. To ensure evaluation reliability, we employ a multi-judge protocol using diverse LLMs from different providers, achieving high inter-judge reliability. We compare existing ISD agents with novel agents grounded in classical ISD theories such as ADDIE, Dick \& Carey, and Rapid Prototyping ISD. Experiments on 1,017 test scenarios demonstrate that integrating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Text Readability and Simplification