Humains-Junior: A 3.8B Language Model Achieving GPT-4o-Level Factual Accuracy by Directed Exoskeleton Reasoning

Nissan Yaron; Dan Bystritsky; Ben-Etzion Yaron

arXiv:2510.25933·cs.AI·October 31, 2025

Humains-Junior: A 3.8B Language Model Achieving GPT-4o-Level Factual Accuracy by Directed Exoskeleton Reasoning

Nissan Yaron, Dan Bystritsky, Ben-Etzion Yaron

PDF

1 Models

TL;DR

Humans-Junior, a 3.8B language model, achieves GPT-4o-level factual accuracy through directed reasoning and fine-tuning, offering a cost-effective alternative with comparable performance and potential for edge deployment.

Contribution

This paper introduces Humans-Junior, a small language model that matches GPT-4o's factual grounding accuracy using a novel combination of directed reasoning scaffolds and behavioral fine-tuning.

Findings

01

Humans-Junior matches GPT-4o's accuracy within a ±5 percentage point margin.

02

It is approximately 19 times cheaper than GPT-4o when purchased as an API.

03

Directed reasoning improves performance on frontier models in prompt-only settings.

Abstract

We introduce Humans-Junior, a 3.8B model that matches GPT-4o on the FACTS Grounding public subset within a $\pm 5$ pp equivalence margin. Results. On Q1--Q500 under identical judges, GPT-4o scores 73.5% (95% CI 69.5--77.2) and Humans-Junior 72.7% (95% CI 68.7--76.5); the paired difference is 0.8 pp (bootstrap 95% CI $- 3.1$ to $+ 4.7$ ; permutation $p = 0.72$ ; Cohen's $d = 0.023$ ). TOST establishes equivalence at $\pm 5$ pp (not at $\pm 3$ pp). When purchased as managed APIs, Humans-Junior's base model (Phi-3.5-mini-instruct) is $\approx 19 \times$ less expensive than GPT-4o on Microsoft AI Foundry pricing; self-hosted or edge deployments can drive incremental inference cost toward zero. Measured vs estimated pricing sources are tabulated in Appendix E. Method. Our approach combines minimal directed "Exoskeleton Reasoning" scaffolds with behavioral fine-tuning that teaches protocol…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Inpris/humains-junior
model· ♡ 3
♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.