Measuring Goal-Directedness
Matt MacDermott, James Fox, Francesco Belardinelli, Tom Everitt

TL;DR
This paper introduces maximum entropy goal-directedness (MEG), a formal measure of agency in causal models and MDPs, with algorithms for its computation, relevant for AI safety and philosophical understanding.
Contribution
It proposes MEG, a novel formal measure of goal-directedness based on maximum causal entropy, along with algorithms for its calculation in various settings.
Findings
MEG can be computed efficiently for small models.
MEG satisfies key theoretical desiderata.
Experiments demonstrate its practical applicability.
Abstract
We define maximum entropy goal-directedness (MEG), a formal measure of goal-directedness in causal models and Markov decision processes, and give algorithms for computing it. Measuring goal-directedness is important, as it is a critical element of many concerns about harm from AI. It is also of philosophical interest, as goal-directedness is a key aspect of agency. MEG is based on an adaptation of the maximum causal entropy framework used in inverse reinforcement learning. It can measure goal-directedness with respect to a known utility function, a hypothesis class of utility functions, or a set of random variables. We prove that MEG satisfies several desiderata and demonstrate our algorithms with small-scale experiments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEvaluation and Performance Assessment
MethodsSparse Evolutionary Training
