Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners

Michal Nauman; Marek Cygan; Carmelo Sferrazza; Aviral Kumar; Pieter Abbeel

arXiv:2505.23150·cs.LG·May 30, 2025

Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners

Michal Nauman, Marek Cygan, Carmelo Sferrazza, Aviral Kumar, Pieter Abbeel

PDF

Open Access

TL;DR

This paper demonstrates that high-capacity, regularized value functions conditioned on learnable task embeddings enable efficient, scalable multi-task reinforcement learning, achieving state-of-the-art results across diverse benchmarks.

Contribution

It introduces a novel approach using large, regularized value models with task embeddings to improve multi-task RL performance and scalability.

Findings

01

Achieves state-of-the-art multi-task performance on 7 benchmarks

02

Enables sample-efficient transfer to new tasks

03

Addresses task interference in online RL with high-capacity models

Abstract

Recent advances in language modeling and vision stem from training large models on diverse, multi-task data. This paradigm has had limited impact in value-based reinforcement learning (RL), where improvements are often driven by small models trained in a single-task context. This is because in multi-task RL sparse rewards and gradient conflicts make optimization of temporal difference brittle. Practical workflows for generalist policies therefore avoid online training, instead cloning expert trajectories or distilling collections of single-task policies into one agent. In this work, we show that the use of high-capacity value models trained via cross-entropy and conditioned on learnable task embeddings addresses the problem of task interference in online RL, allowing for robust and scalable multi-task training. We test our approach on 7 multi-task benchmarks with over 280 unique tasks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics