Machine Theory of Mind and the Structure of Human Values

Paul de Font-Reaulx

arXiv:2505.20342·cs.AI·May 28, 2025

Machine Theory of Mind and the Structure of Human Values

Paul de Font-Reaulx

PDF

Open Access

TL;DR

This paper proposes a Bayesian Theory of Mind approach to infer human values from limited behavior and other values, addressing the challenge of value generalization for ethical AI.

Contribution

It introduces the idea that human values have a generative rational structure, enabling value-to-value inference beyond simple utility models.

Findings

01

Values can be inferred from other values using Bayesian models.

02

Generative value structures improve the prediction of complex human values.

03

This approach advances scalable machine theory of mind for ethical AI.

Abstract

Value learning is a crucial aspect of safe and ethical AI. This is primarily pursued by methods inferring human values from behaviour. However, humans care about much more than we are able to demonstrate through our actions. Consequently, an AI must predict the rest of our seemingly complex values from a limited sample. I call this the value generalization problem. In this paper, I argue that human values have a generative rational structure and that this allows us to solve the value generalization problem. In particular, we can use Bayesian Theory of Mind models to infer human values not only from behaviour, but also from other values. This has been obscured by the widespread use of simple utility functions to represent human values. I conclude that developing generative value-to-value inference is a crucial component of achieving a scalable machine theory of mind.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCognitive Science and Education Research