Generalisation in Multitask Fitted Q-Iteration and Offline Q-learning
Kausthubh Manda, Raghuram Bharadwaj Diddigi

TL;DR
This paper provides theoretical analysis of offline multitask reinforcement learning, showing how shared low-rank representations improve generalization and data efficiency in value function estimation.
Contribution
It introduces a multitask fitted Q-iteration method with finite-sample guarantees, highlighting the benefits of data pooling and shared representations for offline RL.
Findings
Pooling data across tasks improves estimation accuracy.
Shared representations reduce complexity in downstream tasks.
Finite-sample guarantees depend on total samples, horizon, and coverage.
Abstract
We study offline multitask reinforcement learning in settings where multiple tasks share a low-rank representation of their action-value functions. In this regime, a learner is provided with fixed datasets collected from several related tasks, without access to further online interaction, and seeks to exploit shared structure to improve statistical efficiency and generalization. We analyze a multitask variant of fitted Q-iteration that jointly learns a shared representation and task-specific value functions via Bellman error minimization on offline data. Under standard realizability and coverage assumptions commonly used in offline reinforcement learning, we establish finite-sample generalization guarantees for the learned value functions. Our analysis explicitly characterizes how pooling data across tasks improves estimation accuracy, yielding a dependence on the total…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
