Autonomous Alignment with Human Value on Altruism through Considerate   Self-imagination and Theory of Mind

Haibo Tong; Enmeng Lu; Yinqian Sun; Zhengqiang Han; Chao Liu; Feifei; Zhao; Yi Zeng

arXiv:2501.00320·cs.AI·January 8, 2025

Autonomous Alignment with Human Value on Altruism through Considerate Self-imagination and Theory of Mind

Haibo Tong, Enmeng Lu, Yinqian Sun, Zhengqiang Han, Chao Liu, Feifei, Zhao, Yi Zeng

PDF

Open Access 1 Repo

TL;DR

This paper proposes a novel AI framework that incorporates Theory of Mind and self-imagination to enable autonomous, altruistic, and ethically aligned decision-making, inspired by human moral behavior and tested in complex rescue scenarios.

Contribution

It introduces a new approach for AI to autonomously align with human altruistic values using considerate self-imagination and Theory of Mind capabilities.

Findings

01

Agents can proactively anticipate risks and make altruistic decisions.

02

The framework effectively balances self-goals, altruism, and environmental safety.

03

Experimental scenarios demonstrate improved moral decision-making in AI.

Abstract

With the widespread application of Artificial Intelligence (AI) in human society, enabling AI to autonomously align with human values has become a pressing issue to ensure its sustainable development and benefit to humanity. One of the most important aspects of aligning with human values is the necessity for agents to autonomously make altruistic, safe, and ethical decisions, considering and caring for human well-being. Current AI extremely pursues absolute superiority in certain tasks, remaining indifferent to the surrounding environment and other agents, which has led to numerous safety risks. Altruistic behavior in human society originates from humans' capacity for empathizing others, known as Theory of Mind (ToM), combined with predictive imaginative interactions before taking action to produce thoughtful and altruistic behaviors. Inspired by this, we are committed to endow agents…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

braincog-x/brain-cog
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPersonality Traits and Psychology · Financial Literacy and Behavior

MethodsALIGN