Achieving Better Regret against Strategic Adversaries
Le Cong Dinh, Tri-Dung Nguyen, Alain Zemkoho, Long Tran-Thanh

TL;DR
This paper introduces two new online learning algorithms, AFTRL and Prod-BR, that leverage knowledge of adversaries' behavior to achieve better regret bounds and convergence properties in game-theoretic settings, even with imperfect information.
Contribution
The paper proposes novel algorithms that exploit extra knowledge about adversaries, achieving constant regret and last round convergence, a significant improvement over existing methods.
Findings
AFTRL achieves O(1) external and forward regret.
Prod-BR attains O(√T) dynamic regret.
Algorithms outperform state-of-the-art in regret bounds and convergence rates.
Abstract
We study online learning problems in which the learner has extra knowledge about the adversary's behaviour, i.e., in game-theoretic settings where opponents typically follow some no-external regret learning algorithms. Under this assumption, we propose two new online learning algorithms, Accurate Follow the Regularized Leader (AFTRL) and Prod-Best Response (Prod-BR), that intensively exploit this extra knowledge while maintaining the no-regret property in the worst-case scenario of having inaccurate extra information. Specifically, AFTRL achieves external regret or \emph{forward regret} against no-external regret adversary in comparison with \emph{dynamic regret} of Prod-BR. To the best of our knowledge, our algorithm is the first to consider forward regret that achieves regret against strategic adversaries. When playing zero-sum games with Accurate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Adversarial Robustness in Machine Learning
