Learning Team Decisions
Olle Kjellqvist, Ather Gattami

TL;DR
This paper introduces a gradient-based algorithm for linear quadratic team decision problems with unknown parameters, achieving sublinear regret bounds under both full information and bandit feedback scenarios.
Contribution
It develops a novel gradient descent approach for team decision problems with unknown dynamics, providing regret guarantees in both feedback settings.
Findings
Expected regret of O(log(T)) with full information feedback.
Expected regret of O(√T) with bandit feedback.
Additional regret term O(d) in bandit setting due to parameter learning.
Abstract
In this paper, we treat linear quadratic team decision problems, where a team of agents minimizes a convex quadratic cost function over time steps subject to possibly distinct linear measurements of the state of nature. We assume that the state of nature is a Gaussian random variable and that the agents do not know the cost function nor the linear functions mapping the state of nature to their measurements. We present a gradient-descent based algorithm with an expected regret of for full information gradient feedback and for bandit feedback. In the case of bandit feedback, the expected regret has an additional multiplicative term where reflects the number of learned parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Distributed Sensor Networks and Detection Algorithms
