TL;DR
This paper uncovers a low-dimensional, stable, and universal emotional manifold within large language models, enabling effective emotion manipulation across languages and domains.
Contribution
It reveals the existence of a stable, interpretable emotional subspace in LLMs and introduces a method to steer internal emotion perception while preserving semantics.
Findings
Emotional representations form a low-dimensional manifold.
The emotional space is stable across model layers and languages.
Intervention methods effectively control emotions without altering semantics.
Abstract
This work investigates how large language models (LLMs) internally represent emotion by analyzing the geometry of their hidden-state space. The paper identifies a low-dimensional emotional manifold and shows that emotional representations are directionally encoded, distributed across layers, and aligned with interpretable dimensions. These structures are stable across depth and generalize to eight real-world emotion datasets spanning five languages. Cross-domain alignment yields low error and strong linear probe performance, indicating a universal emotional subspace. Within this space, internal emotion perception can be steered while preserving semantics using a learned intervention module, with especially strong control for basic emotions across languages. These findings reveal a consistent and manipulable affective geometry in LLMs and offer insight into how they internalize and…
Peer Reviews
Decision·ICLR 2026 Poster
- Identification of a low-dimensional, directionally encoded emotional manifold: The paper demonstrates that emotions in LLMs occupy a low-dimensional subspace that is interpretable and directionally organized across layers, with principal axes (PC1–PC3) showing high rank correlations in many models/layers. - Cross-dataset and multilingual generalization of emotional structure: Using eight emotion datasets spanning five languages and diverse textual styles, the authors show that the extracted em
- Geometry vs. local distortion — inconsistent relational preservation: Although global alignment measures (cosine, regression) are often strong, stress and distortion analyses reveal notable local warping of relative geometry in many layers and datasets. Thus the emotional manifold is not uniformly faithful to human emotion-space relations, which complicates interpretation and downstream use. - Uneven multilingual and dataset robustness: Performance and steerability degrade in lower-resource se
- The paper presents a comprehensive, cross-lingual study covering eight datasets in five languages, offering strong evidence for the universality of LLM emotion representations. - The use of ML-AURA and SVD-based analyses provides a rigorous and interpretable framework for linking internal neuron activity to affective semantics. - The learned steering module demonstrates practical control of emotion representations while preserving meaning, which is an innovative advance beyond descriptive anal
- While broad in scope, the work is methodologically complex, and the abundance of metrics (stress, distortion, spectral flatness, etc.) may obscure key takeaways. - The evaluation relies heavily on synthetic emotion text for subspace construction, which may bias the identified directions. - Although the paper claims semantic preservation under steering, this is mostly supported by cosine similarity metrics rather than human evaluations.
This paper presents the first investigation on emotional latent space representations in LLMs, and I believe the techniques used are novel and interesting. The authors provide analysis on many different perspectives, styles, and languages, which adds to the robustness of their findings. The new steering method introduced presents a new way to consider how to change emotions: by focusing on changing the underlying emotional subspace rather than focusing on downstream output.
It is unclear exactly what Table 2 is measuring, specifically in regards to cosine similarity and MSE. The paper states high cosine similarity between emotions in real datasets and their synthetic counterparts: what synthetic counterpart are we referring to? Does this reference the Reichman et al. synthetic dataset? Table 2 does not mention which method it utilized as well. What does it mean to measure the cosine similarity of an emotion between two datasets, what datum from each dataset is actu
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
