Mechanistic Decoding of Cognitive Constructs in Large Language Models

Yitong Shou; Manhao Guan

arXiv:2604.14593·cs.CL·April 23, 2026

Mechanistic Decoding of Cognitive Constructs in Large Language Models

Yitong Shou, Manhao Guan

PDF

TL;DR

This paper introduces a novel interpretability framework to decode complex emotions like jealousy in large language models, revealing their internal psychological structure and enabling targeted interventions.

Contribution

It develops a Cognitive Reverse-Engineering framework combining appraisal theory and causal methods to analyze and manipulate emotional representations in LLMs.

Findings

01

Models encode jealousy as a linear combination of psychological factors.

02

Internal representations align with human psychological constructs.

03

Toxic emotional states can be detected and suppressed through the framework.

Abstract

While Large Language Models (LLMs) demonstrate increasingly sophisticated affective capabilities, the internal mechanisms by which they process complex emotions remain unclear. Existing interpretability approaches often treat models as black boxes or focus on coarse-grained basic emotions, leaving the cognitive structure of more complex affective states underexplored. To bridge this gap, we propose a Cognitive Reverse-Engineering framework based on Representation Engineering (RepE) to analyze social-comparison jealousy. By combining appraisal theory with subspace orthogonalization, regression-based weighting, and bidirectional causal steering, we isolate and quantify two psychological antecedents of jealousy, Superiority of Comparison Person and Domain Self-Definitional Relevance, and examine their causal effects on model judgments. Experiments on eight LLMs from the Llama, Qwen, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.