Measuring Goal-Directedness

Matt MacDermott; James Fox; Francesco Belardinelli; Tom Everitt

arXiv:2412.04758·cs.AI·December 9, 2024

Measuring Goal-Directedness

Matt MacDermott, James Fox, Francesco Belardinelli, Tom Everitt

PDF

Open Access 1 Video

TL;DR

This paper introduces maximum entropy goal-directedness (MEG), a formal measure of agency in causal models and MDPs, with algorithms for its computation, relevant for AI safety and philosophical understanding.

Contribution

It proposes MEG, a novel formal measure of goal-directedness based on maximum causal entropy, along with algorithms for its calculation in various settings.

Findings

01

MEG can be computed efficiently for small models.

02

MEG satisfies key theoretical desiderata.

03

Experiments demonstrate its practical applicability.

Abstract

We define maximum entropy goal-directedness (MEG), a formal measure of goal-directedness in causal models and Markov decision processes, and give algorithms for computing it. Measuring goal-directedness is important, as it is a critical element of many concerns about harm from AI. It is also of philosophical interest, as goal-directedness is a key aspect of agency. MEG is based on an adaptation of the maximum causal entropy framework used in inverse reinforcement learning. It can measure goal-directedness with respect to a known utility function, a hypothesis class of utility functions, or a set of random variables. We prove that MEG satisfies several desiderata and demonstrate our algorithms with small-scale experiments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Measuring Goal-Directedness· slideslive

Taxonomy

TopicsEvaluation and Performance Assessment

MethodsSparse Evolutionary Training