TD-MPC-Opt: Distilling Model-Based Multi-Task Reinforcement Learning Agents
Dmytro Kuzmenko, Nadiya Shvai

TL;DR
This paper introduces a distillation method that compresses a large multi-task model into a smaller, efficient model, achieving state-of-the-art performance in resource-constrained reinforcement learning environments.
Contribution
It presents a novel distillation technique for compressing large multi-task reinforcement learning models, enabling efficient deployment without significant performance loss.
Findings
Distilled model achieves a normalized score of 28.45 on MT30.
Model size reduced by approximately 50% through quantization.
Distillation outperforms the original smaller model in multi-task performance.
Abstract
We present a novel approach to knowledge transfer in model-based reinforcement learning, addressing the critical challenge of deploying large world models in resource-constrained environments. Our method efficiently distills a high-capacity multi-task agent (317M parameters) into a compact model (1M parameters) on the MT30 benchmark, significantly improving performance across diverse tasks. Our distilled model achieves a state-of-the-art normalized score of 28.45, surpassing the original 1M parameter model score of 18.93. This improvement demonstrates the ability of our distillation technique to capture and consolidate complex multi-task knowledge. We further optimize the distilled model through FP16 post-training quantization, reducing its size by 50\%. Our approach addresses practical deployment limitations and offers insights into knowledge representation in large world models,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning
