Multi-User Reinforcement Learning with Low Rank Rewards
Naman Agarwal, Prateek Jain, Suhas Kowshik, Dheeraj Nagaraj and, Praneeth Netrapalli

TL;DR
This paper introduces a collaborative reinforcement learning algorithm for multiple users with low-rank reward matrices, significantly reducing sample complexity by leveraging shared structure across users in tabular and linear MDPs.
Contribution
It proposes a novel algorithm that exploits low-rank reward structures for efficient multi-user reinforcement learning, achieving exponential sample complexity reduction.
Findings
Sample complexity depends logarithmically on state-space size for large N.
Algorithm performs efficiently in both tabular and linear MDP settings.
Significant reduction in learning time compared to non-collaborative methods.
Abstract
In this work, we consider the problem of collaborative multi-user reinforcement learning. In this setting there are multiple users with the same state-action space and transition probabilities but with different rewards. Under the assumption that the reward matrix of the users has a low-rank structure -- a standard and practically successful assumption in the offline collaborative filtering setting -- the question is can we design algorithms with significantly lower sample complexity compared to the ones that learn the MDP individually for each user. Our main contribution is an algorithm which explores rewards collaboratively with user-specific MDPs and can learn rewards efficiently in two key settings: tabular MDPs and linear MDPs. When is large and the rank is constant, the sample complexity per MDP depends logarithmically over the size of the state-space, which represents…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Auction Theory and Applications
