ZPD-SCA: Unveiling the Blind Spots of LLMs in Assessing Students' Cognitive Abilities

Wenhan Dong; Zhen Sun; Yuemeng Zhao; Zifan Peng; Jun Wu; Jingyi Zheng; Yule Liu; Xinlei He; Yu Wang; Ruiming Wang; Xinyi Huang; Lei Mo

arXiv:2508.14377·cs.CL·August 26, 2025

ZPD-SCA: Unveiling the Blind Spots of LLMs in Assessing Students' Cognitive Abilities

Wenhan Dong, Zhen Sun, Yuemeng Zhao, Zifan Peng, Jun Wu, Jingyi Zheng, Yule Liu, Xinlei He, Yu Wang, Ruiming Wang, Xinyi Huang, Lei Mo

PDF

TL;DR

This paper introduces ZPD-SCA, a benchmark for evaluating large language models' ability to assess Chinese reading difficulty aligned with students' developmental stages, revealing current limitations and biases in LLMs' educational assessments.

Contribution

The paper presents ZPD-SCA, the first comprehensive benchmark for Chinese reading comprehension difficulty, and evaluates LLMs' performance, highlighting their emerging abilities and existing biases in educational assessment.

Findings

01

LLMs perform poorly in zero-shot scenarios for reading difficulty assessment.

02

In-context examples significantly improve LLM performance.

03

Models exhibit biases and genre-based performance variations.

Abstract

Large language models (LLMs) have demonstrated potential in educational applications, yet their capacity to accurately assess the cognitive alignment of reading materials with students' developmental stages remains insufficiently explored. This gap is particularly critical given the foundational educational principle of the Zone of Proximal Development (ZPD), which emphasizes the need to match learning resources with Students' Cognitive Abilities (SCA). Despite the importance of this alignment, there is a notable absence of comprehensive studies investigating LLMs' ability to evaluate reading comprehension difficulty across different student age groups, especially in the context of Chinese language education. To fill this gap, we introduce ZPD-SCA, a novel benchmark specifically designed to assess stage-level Chinese reading comprehension difficulty. The benchmark is annotated by 60…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.