From Passive to Persuasive: Localized Activation Injection for Empathy and Negotiation
Niranjan Chebrolu, Kokil Jaidka, Gerard Christopher Yeo

TL;DR
This paper introduces STAR, a method that localizes and injects activation vectors in language models to improve complex social behaviors like empathy and negotiation, outperforming previous global methods.
Contribution
The paper demonstrates that complex social behaviors can be effectively manipulated through localized activation injection, revealing their encoding as linear directions in model space.
Findings
Localized injection outperforms global steering and priming.
Human evaluations confirm genuine behavioral improvements.
Behavioral traits are encoded as localized, linear directions in activation space.
Abstract
Complex social behaviors, such as empathy and strategic politeness, are widely assumed to resist the directional decomposition that makes activation steering effective for coarse attributes like sentiment or toxicity. We present STAR: Steering via Attribution and Representation, which tests this assumption by using attribution patching to identify the layer--token positions where each behavioral trait causally originates, then injecting contrastive activation vectors at precisely those locations. Evaluated on emotional dialogue and negotiation in both single- and multi-turn settings, localized injection consistently outperforms global steering and instruction priming; human evaluation confirms that gains reflect genuine improvements in perceived quality rather than lexical surface change. Our results suggest that complex interpersonal behaviors are encoded as localized, approximately…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Artificial Intelligence in Healthcare and Education · Topic Modeling
