Invariant Policy Learning: A Causal Perspective

Sorawit Saengkyongam; Nikolaj Thams; Jonas Peters; Niklas Pfister

arXiv:2106.00808·cs.LG·September 23, 2022

Invariant Policy Learning: A Causal Perspective

Sorawit Saengkyongam, Nikolaj Thams, Jonas Peters, Niklas Pfister

PDF

Open Access 1 Repo

TL;DR

This paper introduces a causal perspective on invariant policy learning in contextual bandits, addressing environmental shifts by proposing methods that ensure policy robustness across changing mechanisms.

Contribution

It develops a framework integrating causality and invariance into offline contextual bandits to handle environment changes, introducing the concept of policy invariance.

Findings

01

Optimal invariant policies can generalize across environments under certain assumptions.

02

Establishes connections between causality, invariance, and contextual bandits.

03

Addresses the challenge of environmental shifts in high-stakes applications.

Abstract

Contextual bandit and reinforcement learning algorithms have been successfully used in various interactive learning systems such as online advertising, recommender systems, and dynamic pricing. However, they have yet to be widely adopted in high-stakes application domains, such as healthcare. One reason may be that existing approaches assume that the underlying mechanisms are static in the sense that they do not change over different environments. In many real-world systems, however, the mechanisms are subject to shifts across environments which may invalidate the static environment assumption. In this paper, we take a step toward tackling the problem of environmental shifts considering the framework of offline contextual bandits. We view the environmental shift problem through the lens of causality and propose multi-environment contextual bandits that allow for changes in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sorawitj/invariant-policy-learning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research