Principal-Agent Bandit Games with Self-Interested and Exploratory Learning Agents

Junyan Liu; Lillian J. Ratliff

arXiv:2412.16318·cs.LG·June 3, 2025

Principal-Agent Bandit Games with Self-Interested and Exploratory Learning Agents

Junyan Liu, Lillian J. Ratliff

PDF

Open Access 1 Video

TL;DR

This paper investigates principal-agent bandit games with self-interested agents who learn and explore, proposing algorithms with regret bounds that adapt to the agent's learning and exploration behaviors in online settings.

Contribution

It introduces a novel elimination framework and algorithms for principal-agent bandit problems with learning agents, achieving improved regret bounds and robustness to exploration.

Findings

01

Achieved () regret bounds for non-exploratory agents.

02

Extended the framework to handle exploratory agents with () regret.

03

Improved regret bounds for agents similar to those in prior work.

Abstract

We study the repeated principal-agent bandit game, where the principal indirectly interacts with the unknown environment by proposing incentives for the agent to play arms. Most existing work assumes the agent has full knowledge of the reward means and always behaves greedily, but in many online marketplaces, the agent needs to learn the unknown environment and sometimes explore. Motivated by such settings, we model a self-interested learning agent with exploration behaviors who iteratively updates reward estimates and either selects an arm that maximizes the estimated reward plus incentive or explores arbitrarily with a certain probability. As a warm-up, we first consider a self-interested learning agent without exploration. We propose algorithms for both i.i.d. and linear reward settings with bandit feedback in a finite horizon $T$ , achieving regret bounds of $O (T)$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Principal-Agent Bandit Games with Self-Interested and Exploratory Learning Agents· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Data Stream Mining Techniques