Probabilistic Performance Guarantees for Multi-Task Reinforcement Learning
Yannik Schnitzer, Mathias Jackermeier, Alessandro Abate, David Parker

TL;DR
This paper introduces a method to provide high-confidence performance guarantees for multi-task reinforcement learning policies on unseen tasks, addressing a key gap in safety-critical applications.
Contribution
It develops a novel generalisation bound that combines per-task confidence bounds with task-level generalisation, enabling formal performance guarantees for new tasks.
Findings
Guarantees are theoretically sound across state-of-the-art methods.
Guarantees are informative at realistic sample sizes.
Method applies to arbitrary and unknown task distributions.
Abstract
Multi-task reinforcement learning trains generalist policies that can execute multiple tasks. While recent years have seen significant progress, existing approaches rarely provide formal performance guarantees, which are indispensable when deploying policies in safety-critical settings. We present an approach for computing high-confidence guarantees on the performance of a multi-task policy on tasks not seen during training. Concretely, we introduce a new generalisation bound that composes (i) per-task lower confidence bounds from finitely many rollouts with (ii) task-level generalisation from finitely many sampled tasks, yielding a high-confidence guarantee for new tasks drawn from the same arbitrary and unknown distribution. Across state-of-the-art multi-task RL methods, we show that the guarantees are theoretically sound and informative at realistic sample sizes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Robot Manipulation and Learning
