LaCy: What Small Language Models Can and Should Learn is Not Just a Question of Loss
Szilvia Ujv\'ary, Louis B\'ethune, Pierre Ablin, Jo\~ao Monteiro, Marco Cuturi, Michael Kirchhof

TL;DR
LaCy introduces a pretraining method for Small Language Models that combines loss and factuality signals to improve token learning and delegation decisions, enhancing factual accuracy and efficiency.
Contribution
The paper proposes LaCy, a novel pretraining approach that integrates loss with factuality signals, enabling SLMs to better decide which tokens to learn or delegate.
Findings
LaCy models learn when to predict or delegate tokens effectively.
LaCy outperforms Rho and LLM-judge trained SLMs in FactScores.
Augmenting loss with grammatical signals improves delegation accuracy.
Abstract
Language models have consistently grown to compress more world knowledge into their parameters, but the knowledge that can be pretrained into them is upper-bounded by their parameter size. Especially the capacity of Small Language Models (SLMs) is limited, leading to factually incorrect generations. This problem is often mitigated by giving the SLM access to an outside source: the ability to query a larger model, documents, or a database. Under this setting, we study the fundamental question of \emph{which tokens an SLM can and should learn} during pretraining, versus \emph{which ones it should delegate} via a \texttt{<CALL>} token. We find that this is not simply a question of loss: although the loss is predictive of whether a predicted token mismatches the ground-truth, it is insufficient for identifying which predictions would actually lead to factual or semantically invalid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Natural Language Processing Techniques
