Responding to Promises: No-regret learning against followers with memory
Vijeth Hebbar, C\'edric Langbort

TL;DR
This paper develops online learning algorithms for a leader in repeated Stackelberg games to effectively respond to followers with memory and bounded rationality, achieving sublinear regret bounds.
Contribution
It introduces novel algorithms that handle followers with memory and bounded rationality, extending prior models that assumed memoryless followers.
Findings
Algorithms achieve $O( oot T)$ regret for memoryless followers.
Algorithms achieve $O( oot{BT})$ regret for followers with memory of length B.
The approach leverages the smoothness of quantal response models to address memory effects.
Abstract
We consider a repeated Stackelberg game setup where the leader faces a sequence of followers of unknown types and must learn what commitments to make. While previous works have considered followers that best respond to the commitment announced by the leader in every round, we relax this setup in two ways. Motivated by natural scenarios where the leader's reputation factors into how the followers choose their response, we consider followers with memory. Specifically, we model followers that base their response on not just the leader's current commitment but on an aggregate of their past commitments. In developing learning strategies that the leader can employ against such followers, we make the second relaxation and assume boundedly rational followers. In particular, we focus on followers employing quantal responses. Interestingly, we observe that the smoothness property offered by the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGrief, Bereavement, and Mental Health
