Multi-User Reinforcement Learning with Low Rank Rewards

Naman Agarwal; Prateek Jain; Suhas Kowshik; Dheeraj Nagaraj and; Praneeth Netrapalli

arXiv:2210.05355·cs.LG·May 23, 2023

Multi-User Reinforcement Learning with Low Rank Rewards

Naman Agarwal, Prateek Jain, Suhas Kowshik, Dheeraj Nagaraj and, Praneeth Netrapalli

PDF

Open Access 1 Video

TL;DR

This paper introduces a collaborative reinforcement learning algorithm for multiple users with low-rank reward matrices, significantly reducing sample complexity by leveraging shared structure across users in tabular and linear MDPs.

Contribution

It proposes a novel algorithm that exploits low-rank reward structures for efficient multi-user reinforcement learning, achieving exponential sample complexity reduction.

Findings

01

Sample complexity depends logarithmically on state-space size for large N.

02

Algorithm performs efficiently in both tabular and linear MDP settings.

03

Significant reduction in learning time compared to non-collaborative methods.

Abstract

In this work, we consider the problem of collaborative multi-user reinforcement learning. In this setting there are multiple users with the same state-action space and transition probabilities but with different rewards. Under the assumption that the reward matrix of the $N$ users has a low-rank structure -- a standard and practically successful assumption in the offline collaborative filtering setting -- the question is can we design algorithms with significantly lower sample complexity compared to the ones that learn the MDP individually for each user. Our main contribution is an algorithm which explores rewards collaboratively with $N$ user-specific MDPs and can learn rewards efficiently in two key settings: tabular MDPs and linear MDPs. When $N$ is large and the rank is constant, the sample complexity per MDP depends logarithmically over the size of the state-space, which represents…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Multi-User Reinforcement Learning with Low Rank Rewards· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Auction Theory and Applications