Small Models Achieve Large Language Model Performance: Evaluating Reasoning-Enabled AI for Secure Child Welfare Research

Zia Qi; Brian E. Perron; Bryan G. Victor; Dragan Stoll; Joseph P. Ryan

arXiv:2512.04261·cs.CY·January 14, 2026

Small Models Achieve Large Language Model Performance: Evaluating Reasoning-Enabled AI for Secure Child Welfare Research

Zia Qi, Brian E. Perron, Bryan G. Victor, Dragan Stoll, Joseph P. Ryan

PDF

Open Access

TL;DR

This study demonstrates that small, reasoning-enabled language models can outperform larger models in classifying child welfare risk factors, offering a resource-efficient alternative for social work research.

Contribution

The paper introduces a benchmarking framework showing small models with extended reasoning outperform larger models in child welfare classification tasks.

Findings

01

Small models with extended reasoning outperform larger models.

02

Models with 4B parameters achieve near-perfect agreement.

03

Resource-efficient models match large model performance.

Abstract

Objective: This study develops a systematic benchmarking framework for testing whether language models can accurately identify constructs of interest in child welfare records. The objective is to assess how different model sizes and architectures perform on four validated benchmarks for classifying critical risk factors among child welfare-involved families: domestic violence, firearms, substance-related problems generally, and opioids specifically. Method: We constructed four benchmarks for identifying risk factors in child welfare investigation summaries: domestic violence, substance-related problems, firearms, and opioids (n=500 each). We evaluated seven model sizes (0.6B-32B parameters) in standard and extended reasoning modes, plus a mixture-of-experts variant. Cohen's kappa measured agreement with gold standard classifications established by human experts. Results: The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsChild Abuse and Trauma · Prenatal Substance Exposure Effects · Mental Health via Writing