Learning Team Decisions

Olle Kjellqvist; Ather Gattami

arXiv:2212.11567·math.OC·December 23, 2022

Learning Team Decisions

Olle Kjellqvist, Ather Gattami

PDF

Open Access

TL;DR

This paper introduces a gradient-based algorithm for linear quadratic team decision problems with unknown parameters, achieving sublinear regret bounds under both full information and bandit feedback scenarios.

Contribution

It develops a novel gradient descent approach for team decision problems with unknown dynamics, providing regret guarantees in both feedback settings.

Findings

01

Expected regret of O(log(T)) with full information feedback.

02

Expected regret of O(√T) with bandit feedback.

03

Additional regret term O(d) in bandit setting due to parameter learning.

Abstract

In this paper, we treat linear quadratic team decision problems, where a team of agents minimizes a convex quadratic cost function over $T$ time steps subject to possibly distinct linear measurements of the state of nature. We assume that the state of nature is a Gaussian random variable and that the agents do not know the cost function nor the linear functions mapping the state of nature to their measurements. We present a gradient-descent based algorithm with an expected regret of $O (lo g (T))$ for full information gradient feedback and $O ((T))$ for bandit feedback. In the case of bandit feedback, the expected regret has an additional multiplicative term $O (d)$ where $d$ reflects the number of learned parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Distributed Sensor Networks and Detection Algorithms