Learning to Use Learners' Advice
Adish Singla, Hamed Hassani, Andreas Krause

TL;DR
This paper introduces a new online learning framework where experts learn from limited feedback, and proposes an algorithm that achieves sublinear regret by guiding expert feedback, addressing the challenge of no-regret learning without coordination.
Contribution
The paper models experts as learning entities with limited feedback and develops a novel algorithm that guides feedback to achieve no-regret guarantees.
Findings
Proves the impossibility of no-regret algorithms without coordination.
Designs a feedback-guided algorithm achieving regret of O(T^{1/(2 - egretRate)}).
Demonstrates the effectiveness of guiding expert feedback in online learning.
Abstract
In this paper, we study a variant of the framework of online learning using expert advice with limited/bandit feedback. We consider each expert as a learning entity, seeking to more accurately reflecting certain real-world applications. In our setting, the feedback at any time is limited in a sense that it is only available to the expert that has been selected by the central algorithm (forecaster), \emph{i.e.}, only the expert receives feedback from the environment and gets to learn at time . We consider a generic black-box approach whereby the forecaster does not control or know the learning dynamics of the experts apart from knowing the following no-regret learning property: the average regret of any expert vanishes at a rate of at least with learning steps where is a parameter. In the spirit of competing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
