Negative Before Positive: Asymmetric Valence Processing in Large Language Models

Sohan Venkatesh

arXiv:2605.05653·cs.CL·May 8, 2026

Negative Before Positive: Asymmetric Valence Processing in Large Language Models

Sohan Venkatesh

PDF

TL;DR

This paper investigates how large language models process emotional valence, revealing that negative and positive emotions are encoded at different network depths and can be manipulated through targeted steering.

Contribution

It demonstrates that emotional valence in LLMs is localized, causal, and steerable, providing a concrete target for interpretability and oversight.

Findings

01

Negative valence localizes to early layers

02

Positive valence peaks at mid-to-late layers

03

Steering can shift neutral prompts toward positive valence

Abstract

Mechanistic interpretability has revealed how concepts are encoded in large language models (LLMs), but emotional content remains poorly understood at the mechanistic level. We study whether LLMs process emotional valence through dedicated internal structure or through surface token matching. Using activation patching and steering on open-source LLMs, we find that negative and positive valence are processed at different network depths. Negative outcomes localize to early layers while positive outcomes peak at mid-to-late layers. Holding topic fixed while flipping valence produces sign-opposite responses, ruling out topic detection. Steering with the good-news direction at the identified layers shifts neutral prompts toward positive valence, showing these layers encode valence as a manipulable direction. Emotional valence in LLMs is localized, causal and steerable, making it a concrete…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.