AIPsy-Affect: A Keyword-Free Clinical Stimulus Battery for Mechanistic Interpretability of Emotion in Language Models

Michael Keeman

arXiv:2604.23719·cs.CL·April 29, 2026

AIPsy-Affect: A Keyword-Free Clinical Stimulus Battery for Mechanistic Interpretability of Emotion in Language Models

Michael Keeman

PDF

1 Datasets

TL;DR

AIPsy-Affect is a new stimulus set designed to study emotion in language models without relying on emotion keywords, enabling clearer interpretation of model mechanisms.

Contribution

It introduces a keyword-free, matched-pair stimulus battery for emotion research in language models, improving interpretability and methodological rigor.

Findings

01

Matched-pair structure isolates emotion from keywords.

02

Contextual classifiers detect affect but not specific categories.

03

Bag-of-words methods only identify situational vocabulary.

Abstract

Mechanistic interpretability research on emotion in large language models -- linear probing, activation patching, sparse autoencoder (SAE) feature analysis, causal ablation, steering vector extraction -- depends on stimuli that contain the words for the emotions they test. When a probe fires on "I am furious", it is unclear whether the model has detected anger or detected the word "furious". The two readings have very different consequences for every downstream claim about emotion circuits, features, and interventions. We release AIPsy-Affect, a 480-item clinical stimulus battery that removes the confound at the stimulus level: 192 keyword-free vignettes evoking each of Plutchik's eight primary emotions through narrative situation alone, 192 matched neutral controls that share characters, setting, length, and surface structure with the affect surgically removed, plus moderate-intensity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

keidolabs/aipsy-affect
dataset· 98 dl
98 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.