LaCy: What Small Language Models Can and Should Learn is Not Just a Question of Loss

Szilvia Ujv\'ary; Louis B\'ethune; Pierre Ablin; Jo\~ao Monteiro; Marco Cuturi; Michael Kirchhof

arXiv:2602.12005·cs.CL·May 18, 2026

LaCy: What Small Language Models Can and Should Learn is Not Just a Question of Loss

Szilvia Ujv\'ary, Louis B\'ethune, Pierre Ablin, Jo\~ao Monteiro, Marco Cuturi, Michael Kirchhof

PDF

TL;DR

LaCy introduces a pretraining method for Small Language Models that combines loss and factuality signals to improve token learning and delegation decisions, enhancing factual accuracy and efficiency.

Contribution

The paper proposes LaCy, a novel pretraining approach that integrates loss with factuality signals, enabling SLMs to better decide which tokens to learn or delegate.

Findings

01

LaCy models learn when to predict or delegate tokens effectively.

02

LaCy outperforms Rho and LLM-judge trained SLMs in FactScores.

03

Augmenting loss with grammatical signals improves delegation accuracy.

Abstract

Language models have consistently grown to compress more world knowledge into their parameters, but the knowledge that can be pretrained into them is upper-bounded by their parameter size. Especially the capacity of Small Language Models (SLMs) is limited, leading to factually incorrect generations. This problem is often mitigated by giving the SLM access to an outside source: the ability to query a larger model, documents, or a database. Under this setting, we study the fundamental question of \emph{which tokens an SLM can and should learn} during pretraining, versus \emph{which ones it should delegate} via a \texttt{<CALL>} token. We find that this is not simply a question of loss: although the loss is predictive of whether a predicted token mismatches the ground-truth, it is insufficient for identifying which predictions would actually lead to factual or semantically invalid…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Natural Language Processing Techniques