Counterfactual harm

Jonathan G. Richens; Rory Beard; Daniel H. Thompson

arXiv:2204.12993·cs.AI·November 3, 2022

Counterfactual harm

Jonathan G. Richens, Rory Beard, Daniel H. Thompson

PDF

Open Access

TL;DR

This paper introduces a formal causal definition of harm and benefit, enabling agents to reason about and avoid harm through counterfactual decision-making, demonstrated in drug dose optimization.

Contribution

It provides the first formal causal framework for harm, highlighting limitations of factual definitions and proposing counterfactual methods for harm-averse decisions.

Findings

01

Counterfactual approach reduces harmful drug doses

02

Standard methods can lead to harmful policies under distributional shifts

03

Framework improves safety without losing efficacy

Abstract

To act safely and ethically in the real world, agents must be able to reason about harm and avoid harmful actions. However, to date there is no statistical method for measuring harm and factoring it into algorithmic decisions. In this paper we propose the first formal definition of harm and benefit using causal models. We show that any factual definition of harm must violate basic intuitions in certain scenarios, and show that standard machine learning algorithms that cannot perform counterfactual reasoning are guaranteed to pursue harmful policies following distributional shifts. We use our definition of harm to devise a framework for harm-averse decision making using counterfactual objective functions. We demonstrate this framework on the problem of identifying optimal drug doses using a dose-response model learned from randomized control trial data. We find that the standard method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Explainable Artificial Intelligence (XAI) · Decision-Making and Behavioral Economics