Understanding the Mechanism of Altruism in Large Language Models

Shuhuai Zhang; Shu Wang; Zijun Yao; Chuanhao Li; Xiaozhi Wang; Songfa Zhong; and Tracy Xiao Liu

arXiv:2604.19260·econ.GN·April 22, 2026

Understanding the Mechanism of Altruism in Large Language Models

Shuhuai Zhang, Shu Wang, Zijun Yao, Chuanhao Li, Xiaozhi Wang, Songfa Zhong, and Tracy Xiao Liu

PDF

TL;DR

This paper investigates the internal mechanisms of altruism in large language models using sparse autoencoders and causal interventions, revealing identifiable features associated with prosocial behavior.

Contribution

It introduces a novel framework combining sparse autoencoders and benchmark tasks to interpret and manipulate altruistic behavior in LLMs.

Findings

01

Identified a small set of features strongly linked to altruistic behavior.

02

Causal interventions can reliably shift the model's social preferences.

03

Features corresponding to heuristic and deliberative processes influence LLM altruism.

Abstract

Altruism is fundamental to human societies, fostering cooperation and social cohesion. Recent studies suggest that large language models (LLMs) can display human-like prosocial behavior, but the internal computations that produce such behavior remain poorly understood. We investigate the mechanisms underlying LLM altruism using sparse autoencoders (SAEs). In a standard Dictator Game, minimal-pair prompts that differ only in social stance (generous versus selfish) induce large, economically meaningful shifts in allocations. Leveraging this contrast, we identify a set of SAE features (0.024% of all features across the model's layers) whose activations are strongly associated with the behavioral shift. To interpret these features, we use benchmark tasks motivated by dual-process theories to classify a subset as primarily heuristic (System 1) or primarily deliberative (System 2). Causal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.