Contractual Reinforcement Learning: Pulling Arms with Invisible Hands

Jibang Wu; Siyu Chen; Mengdi Wang; Huazheng Wang; Haifeng Xu

arXiv:2407.01458·cs.LG·July 3, 2024

Contractual Reinforcement Learning: Pulling Arms with Invisible Hands

Jibang Wu, Siyu Chen, Mengdi Wang, Huazheng Wang, Haifeng Xu

PDF

Open Access

TL;DR

This paper introduces a theoretical framework for aligning stakeholder interests in online reinforcement learning through contract design, providing algorithms for optimal contracts and regret minimization in complex decision-making scenarios.

Contribution

It develops a formal model of contractual reinforcement learning, offering dynamic programming solutions for planning and no-regret algorithms for learning, with tailored algorithms achieving near-optimal regret bounds.

Findings

01

Optimal contracts can be computed efficiently for planning.

02

No-regret algorithms are designed for the learning problem.

03

Achieves $ ilde{O}( oot{2} ull{T})$ regret in natural problem classes.

Abstract

The agency problem emerges in today's large scale machine learning tasks, where the learners are unable to direct content creation or enforce data collection. In this work, we propose a theoretical framework for aligning economic interests of different stakeholders in the online learning problems through contract design. The problem, termed \emph{contractual reinforcement learning}, naturally arises from the classic model of Markov decision processes, where a learning principal seeks to optimally influence the agent's action policy for their common interests through a set of payment rules contingent on the realization of next state. For the planning problem, we design an efficient dynamic programming algorithm to determine the optimal contracts against the far-sighted agent. For the learning problem, we introduce a generic design of no-regret learning algorithms to untangle the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLaw, Economics, and Judicial Systems

MethodsSparse Evolutionary Training