On Distinguishing Capability Elicitation from Capability Creation in Post-Training: A Free-Energy Perspective
Yuhao Li, Shengchao Liu

TL;DR
This paper introduces a framework to distinguish between capability elicitation and creation in post-training of language models, based on whether behaviors are reweighted or new capabilities are developed.
Contribution
It proposes the notion of accessible support and a free-energy perspective to operationalize the distinction between elicitation and creation in post-training.
Findings
Reweighting behaviors within the support is capability elicitation.
Changing the support corresponds to capability creation.
Both SFT and RL are seen as reweighting a reference distribution.
Abstract
Debates about large language model post-training often treat supervised fine-tuning (SFT) as imitation and reinforcement learning (RL) as discovery. But this distinction is too coarse. What matters is whether a training procedure increases the probability of behaviors the pretrained model could already produce, or whether it changes what the model can practically reach. We argue that post-training research should distinguish between capability elicitation and capability creation. We make this distinction operational by introducing the notion of accessible support: the set of behaviors that a model can practically produce under finite budgets. Post-training that reweights behaviors within this support is capability elicitation; whereas changing the support itself corresponds to capability creation. We develop this argument through a free-energy view of post-training. SFT and RL can both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
