We Urgently Need Intrinsically Kind Machines

Joshua T. S. Hewson

arXiv:2411.04126·cs.AI·November 8, 2024

We Urgently Need Intrinsically Kind Machines

Joshua T. S. Hewson

PDF

Open Access

TL;DR

The paper emphasizes the importance of embedding intrinsic kindness in AI systems to ensure alignment with human values, proposing a framework and algorithm for this purpose.

Contribution

It introduces a novel framework and algorithm to embed kindness as an intrinsic motivation in foundation models, addressing alignment issues.

Findings

01

Proposes a kindness-based intrinsic motivation framework

02

Develops an algorithm for embedding kindness in models

03

Discusses limitations and future research directions

Abstract

Artificial Intelligence systems are rapidly evolving, integrating extrinsic and intrinsic motivations. While these frameworks offer benefits, they risk misalignment at the algorithmic level while appearing superficially aligned with human values. In this paper, we argue that an intrinsic motivation for kindness is crucial for making sure these models are intrinsically aligned with human values. We argue that kindness, defined as a form of altruism motivated to maximize the reward of others, can counteract any intrinsic motivations that might lead the model to prioritize itself over human well-being. Our approach introduces a framework and algorithm for embedding kindness into foundation models by simulating conversations. Limitations and future research directions for scalable implementation are discussed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputability, Logic, AI Algorithms