SproutBench: A Benchmark for Safe and Ethical Large Language Models for Youth

Wenpeng Xing; Lanyi Wei; Haixiao Hu; Jingyi Yu; Rongchang Li; Mohan Li; Changting Lin; Meng Han

arXiv:2508.11009·cs.CL·December 16, 2025

SproutBench: A Benchmark for Safe and Ethical Large Language Models for Youth

Wenpeng Xing, Lanyi Wei, Haixiao Hu, Jingyi Yu, Rongchang Li, Mohan Li, Changting Lin, Meng Han

PDF

Open Access 2 Videos

TL;DR

SproutBench is a comprehensive benchmark designed to evaluate the safety and ethical considerations of large language models when used by children and adolescents, addressing gaps in age-specific risks.

Contribution

The paper introduces SproutBench, a novel evaluation suite with 1,283 prompts targeting developmental and safety risks specific to minors, and provides empirical analysis of 47 LLMs.

Findings

01

Substantial safety vulnerabilities in current LLMs for youth

02

Strong correlations between safety dimensions and risk factors

03

Inverse relationship between interactivity and age appropriateness

Abstract

The rapid proliferation of large language models (LLMs) in applications targeting children and adolescents necessitates a fundamental reassessment of prevailing AI safety frameworks, which are largely tailored to adult users and neglect the distinct developmental vulnerabilities of minors. This paper highlights key deficiencies in existing LLM safety benchmarks, including their inadequate coverage of age-specific cognitive, emotional, and social risks spanning early childhood (ages 0--6), middle childhood (7--12), and adolescence (13--18). To bridge these gaps, we introduce SproutBench, an innovative evaluation suite comprising 1,283 developmentally grounded adversarial prompts designed to probe risks such as emotional dependency, privacy violations, and imitation of hazardous behaviors. Through rigorous empirical evaluation of 47 diverse LLMs, we uncover substantial safety…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SproutBench: A Benchmark for Safe and Ethical Large Language Models for Youth· underline

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning · Topic Modeling