TL;DR
This paper introduces the Next Token Perception Score (NTPS), a metric to evaluate how well autoregressive language model representations align with perception tasks, correlating strongly with downstream performance and aiding in fine-tuning assessments.
Contribution
The paper proposes NTPS, a novel analytical metric for measuring the alignment between autoregressive representations and perception tasks, validated across multiple models and datasets.
Findings
NTPS correlates strongly with linear probe accuracy.
LoRA fine-tuning increases NTPS, improving perception alignment.
NTPS predicts gains from LoRA fine-tuning.
Abstract
Autoregressive pretraining has become the de facto paradigm for learning general-purpose representations in large language models (LLMs). However, linear probe performance across downstream perception tasks shows substantial variability, suggesting that features optimized for next-token prediction do not consistently transfer well to downstream perception tasks. We demonstrate that representations learned via autoregression capture features that may lie outside the subspaces most informative for perception. To quantify the (mis)alignment between autoregressive pretraining and downstream perception, we introduce the Next Token Perception Score (NTPS)-a score derived under a linear setting that measures the overlap between autoregressive and perception feature subspaces. This metric can be easily computed in closed form from pretrained representations and labeled data, and is proven to…
Peer Reviews
Decision·Submitted to ICLR 2026
- The proposed metric and its derivation seem pretty clear. It is essentially a subspace alignment score between the frobenius norm of the perception encoder U that lies inside the next token subspace spanned by V. - The metric seems to be well-correlated with downstream performance across different models.
> Takeaway: Linear probing on pretrained LLM representations can outperform, match, or underperform full-training from scratch. - Agreed that the linear probing technique is indeed noisy, but Table 1 and this claim seem to be slightly misleading. These linear probes are used as a way to approximate how good the model are at the downstream tasks like Emotion, etc, so a better study seems to be how well the linear probes correlate to the full finetuning performance (when taking different checkpoin
1. The work proposes NTPS as a novel metric for measuring the misalignment between perception and next-token prediction objectives, addressing an important gap in understanding of pretrained LLMs’ limited transferability to downstream tasks. 2. The paper includes comprehensive and extensive experimental results : (1) Table 1 demonstrates that linear probing can outperform, match, or underperform full training from scratch, establishing the motivation for the work (2) Figure 2 shows consistent c
1. The paper claims that misalignment between perception and autoregressive spaces arises primarily from the next-token prediction loss during pretraining (lines 54-59, Section 3.1). However, other confounding factors could contribute to this phenomenon, including (1) pretraining data size and distribution mismatches with downstream tasks (2) optimization dynamics and implicit biases. 2. Related to the first point, the paper does not adequately control for or discuss these alternative explanati
The paper is nicely organized and clearly motivated. The proposed NTPS metric provides an intuitive geometric perspective on alignment and the theoretical section is clear and builds good intuition. The experiments cover a wide range of models and datasets and NTPS shows convincing correlation with MSE loss and additional accuracy gains from LoRA.
All reported results are based on rank correlations, which makes it hard to interpret what the metric actually means in practice. If I have a model and a downstream task and compute an NTPS score, how should I interpret its magnitude? The paper doesn’t provide guidance on what constitutes a "high" or "low" score, which limits its usefulness. The claim on syntactic vs semantic groupings (as in Fig 1) is nice for intuition but does not seem rigorous based on comparing just the top 2 eigenvalues.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
