Quiet Feature Learning in Algorithmic Tasks

Prudhviraj Naidu; Zixian Wang; Leon Bergen; Ramamohan Paturi

arXiv:2505.03997·cs.LG·January 15, 2026

Quiet Feature Learning in Algorithmic Tasks

Prudhviraj Naidu, Zixian Wang, Leon Bergen, Ramamohan Paturi

PDF

Open Access

TL;DR

This paper shows that Transformer models learn hidden intermediate features during algorithmic tasks, which are crucial for performance but do not immediately affect loss, challenging current training diagnostics.

Contribution

It reveals the existence of quiet, intermediate features learned prior to loss improvement and demonstrates their causal importance for task success.

Findings

01

Quiet features are learned before loss decreases.

02

Quiet features are causally necessary for task performance.

03

Loss curves can be flat while significant representational progress occurs.

Abstract

We train Transformer-based language models on ten foundational algorithmic tasks and observe pronounced phase transitions in their loss curves that deviate from established power-law scaling trends. Over large ranges of compute, the validation loss barely improves, then abruptly decreases. Probing the models' internal representations reveals that quiet features are learned prior to any decrease in task loss. These quiet features represent intermediate algorithmic computations that do not by themselves improve the output loss. Ablation experiments demonstrate that individual quiet features are causally necessary for task performance. Our results demonstrate that substantial representational progress can remain hidden beneath an apparently flat loss curve, challenging the prevailing use of cross-entropy as a proxy for learning and motivating richer diagnostics for monitoring model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Face and Expression Recognition · Neural Networks and Applications