Unpacking Softmax: How Temperature Drives Representation Collapse, Compression, and Generalization
Wojciech Masarczyk, Mateusz Ostaszewski, Tin Sum Cheng, Tomasz Trzci\'nski, Aurelien Lucchi, Razvan Pascanu

TL;DR
This paper investigates how the softmax function's temperature parameter influences neural network representations, revealing a rank deficit bias and demonstrating how temperature tuning can improve compression and out-of-distribution generalization.
Contribution
It introduces the concept of rank deficit bias caused by softmax temperature and shows how to leverage temperature tuning for better representation compression and robustness.
Findings
Softmax temperature affects the rank of learned representations.
Adjusting temperature can improve out-of-distribution performance.
Softmax dynamics can be exploited for representation compression.
Abstract
The softmax function is a fundamental building block of deep neural networks, commonly used to define output distributions in classification tasks or attention weights in transformer architectures. Despite its widespread use and proven effectiveness, its influence on learning dynamics and learned representations remains poorly understood, limiting our ability to optimize model behavior. In this paper, we study the pivotal role of the softmax function in shaping the model's representation. We introduce the concept of rank deficit bias - a phenomenon in which softmax-based deep networks find solutions of rank much lower than the number of classes. This bias depends on the softmax function's logits norm, which is implicitly influenced by hyperparameters or directly modified by softmax temperature. Furthermore, we demonstrate how to exploit the softmax dynamics to learn compressed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Systems and Time Series Analysis
