Contractual Reinforcement Learning: Pulling Arms with Invisible Hands
Jibang Wu, Siyu Chen, Mengdi Wang, Huazheng Wang, Haifeng Xu

TL;DR
This paper introduces a theoretical framework for aligning stakeholder interests in online reinforcement learning through contract design, providing algorithms for optimal contracts and regret minimization in complex decision-making scenarios.
Contribution
It develops a formal model of contractual reinforcement learning, offering dynamic programming solutions for planning and no-regret algorithms for learning, with tailored algorithms achieving near-optimal regret bounds.
Findings
Optimal contracts can be computed efficiently for planning.
No-regret algorithms are designed for the learning problem.
Achieves $ ilde{O}( oot{2} ull{T})$ regret in natural problem classes.
Abstract
The agency problem emerges in today's large scale machine learning tasks, where the learners are unable to direct content creation or enforce data collection. In this work, we propose a theoretical framework for aligning economic interests of different stakeholders in the online learning problems through contract design. The problem, termed \emph{contractual reinforcement learning}, naturally arises from the classic model of Markov decision processes, where a learning principal seeks to optimally influence the agent's action policy for their common interests through a set of payment rules contingent on the realization of next state. For the planning problem, we design an efficient dynamic programming algorithm to determine the optimal contracts against the far-sighted agent. For the learning problem, we introduce a generic design of no-regret learning algorithms to untangle the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLaw, Economics, and Judicial Systems
MethodsSparse Evolutionary Training
