Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models

Samy Jelassi; Mujin Kwun; Rosie Zhao; Yuanzhi Li; Nicolo Fusi; Yilun Du; Sham M. Kakade; Carles Domingo-Enrich

arXiv:2603.12248·cs.LG·March 17, 2026

Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models

Samy Jelassi, Mujin Kwun, Rosie Zhao, Yuanzhi Li, Nicolo Fusi, Yilun Du, Sham M. Kakade, Carles Domingo-Enrich

PDF

Open Access

TL;DR

This paper introduces energy-based fine-tuning (EBFT) for language models, focusing on sequence-level feature matching to improve downstream task performance without task-specific verifiers.

Contribution

It proposes a novel feature-matching objective and an efficient energy-based optimization method for language model fine-tuning, connecting it to KL-regularized energy-based modeling.

Findings

01

EBFT matches RLVR in performance.

02

EBFT outperforms SFT on downstream accuracy.

03

EBFT achieves lower validation cross-entropy than RLVR and SFT.

Abstract

Cross-entropy (CE) training provides dense and scalable supervision for language models, but it optimizes next-token prediction under teacher forcing rather than sequence-level behavior under model rollouts. We introduce a feature-matching objective for language-model fine-tuning that targets sequence-level statistics of the completion distribution, providing dense semantic feedback without requiring a task-specific verifier or preference model. To optimize this objective efficiently, we propose energy-based fine-tuning (EBFT), which uses strided block-parallel sampling to generate multiple rollouts from nested prefixes concurrently, batches feature extraction over these rollouts, and uses the resulting embeddings to perform an on-policy policy-gradient update. We present a theoretical perspective connecting EBFT to KL-regularized feature-matching and energy-based modeling. Empirically,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis