Contextual Bilevel Reinforcement Learning for Incentive Alignment
Vinzenz Thoma, Barna Pasztor, Andreas Krause, Giorgia Ramponi, Yifan, Hu

TL;DR
This paper introduces a novel stochastic bilevel reinforcement learning framework called CB-RL, modeling complex strategic decision-making scenarios involving environmental and exogenous uncertainties, with a new algorithm that converges and is adaptable to various real-world applications.
Contribution
The paper develops CB-RL, a new bilevel RL model for strategic decision-making under uncertainty, and proposes a stochastic hyperpolicy gradient algorithm with proven convergence.
Findings
The HPGD algorithm converges under the proposed framework.
Empirical results demonstrate effectiveness in reward shaping and tax design.
The framework generalizes traditional bilevel optimization to stochastic, contextual settings.
Abstract
The optimal policy in various real-world strategic decision-making problems depends both on the environmental configuration and exogenous events. For these settings, we introduce Contextual Bilevel Reinforcement Learning (CB-RL), a stochastic bilevel decision-making model, where the lower level consists of solving a contextual Markov Decision Process (CMDP). CB-RL can be viewed as a Stackelberg Game where the leader and a random context beyond the leader's control together decide the setup of many MDPs that potentially multiple followers best respond to. This framework extends beyond traditional bilevel optimization and finds relevance in diverse fields such as RLHF, tax design, reward shaping, contract theory and mechanism design. We propose a stochastic Hyper Policy Gradient Descent (HPGD) algorithm to solve CB-RL, and demonstrate its convergence. Notably, HPGD uses stochastic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsRisk and Portfolio Optimization · Stochastic processes and financial applications
