HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents

Yilin Jiang; Fei Tan; Xuanyu Yin; Jing Leng; Aimin Zhou

arXiv:2603.04855·cs.CL·April 27, 2026

HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents

Yilin Jiang, Fei Tan, Xuanyu Yin, Jing Leng, Aimin Zhou

PDF

1 Repo 1 Datasets

TL;DR

HACHIMI is a scalable framework that generates theory-aligned, controllable student personas for educational research and benchmarking using multi-agent orchestration and neuro-symbolic validation.

Contribution

It introduces a novel multi-agent propose-validate-revise framework for generating diverse, theory-consistent student personas with controllable population distributions.

Findings

01

Generated 1 million personas for Grades 1-12.

02

High schema validity and quota accuracy achieved.

03

Strong alignment between human and agent responses in math and curiosity.

Abstract

Student Personas (SPs) are emerging as infrastructure for educational LLMs, yet prior work often relies on ad-hoc prompting or hand-crafted profiles with limited control over educational theory and population distributions. We formalize this as Theory-Aligned and Distribution-Controllable Persona Generation (TAD-PG) and introduce HACHIMI, a multi-agent Propose-Validate-Revise framework that generates theory-aligned, quota-controlled personas. HACHIMI factorizes each persona into a theory-anchored educational schema, enforces developmental and psychological constraints via a neuro-symbolic validator, and combines stratified sampling with semantic deduplication to reduce mode collapse. The resulting HACHIMI-1M corpus comprises 1 million personas for Grades 1-12. Intrinsic evaluation shows near-perfect schema validity, accurate quotas, and substantial diversity, while external evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZeroLoss-Lab/HACHIMI
github

Datasets

sii-research/HACHIMI-1M
dataset· 148 dl
148 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.