Representation Without Reward: A JEPA Audit for LLM Fine-Tuning
Biswa Sengupta

TL;DR
This paper investigates whether training auxiliary tasks on hidden representations of language models improves their task performance, finding that most do not produce significant gains in a rigorous statistical setting.
Contribution
It provides a comprehensive empirical evaluation of various auxiliary training methods for LLMs, revealing that they generally do not enhance task metrics despite altering hidden-state geometry.
Findings
Most auxiliaries do not significantly improve task metrics after correction.
Decoder-visible JEPA achieves the first positive auxiliary-cross-entropy gradient cosine.
Null results are consistent across different fine-tuning regimes and seeds.
Abstract
Joint-embedding predictive architectures (JEPAs) propose that a model should learn more useful abstractions when trained to predict latent representations rather than observed outputs. For autoregressive language-model fine-tuning the principle entails a stricter requirement: the induced hidden-state geometry must reach the language-model head \emph{and} improve the decoded task metric. We test that requirement under a fixed Llama-3.2-1B-Instruct LoRA harness on natural-language-to-regex generation, comparing twenty-two training-time auxiliaries across trajectory-shape regularisation, distributional constraints, predictor/target asymmetry, Fisher-metric Jacobi residuals, and a decoder-visible JEPA objective constructed to lie in cross-entropy's positive cone. The empirical answer is a structured null: several auxiliaries clear single-cell paired without correction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
