HarmoDT: Harmony Multi-Task Decision Transformer for Offline   Reinforcement Learning

Shengchao Hu; Ziqing Fan; Li Shen; Ya Zhang; Yanfeng Wang; Dacheng Tao

arXiv:2405.18080·cs.LG·May 29, 2024·1 cites

HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning

Shengchao Hu, Ziqing Fan, Li Shen, Ya Zhang, Yanfeng Wang, Dacheng Tao

PDF

Open Access 1 Repo

TL;DR

HarmoDT introduces a meta-learning based method to identify task-specific parameter subspaces in multi-task offline reinforcement learning, improving policy performance across diverse tasks.

Contribution

It proposes a novel bi-level optimization framework to learn task-specific masks for better parameter sharing in multi-task decision transformers.

Findings

01

Outperforms existing methods on benchmark tasks

02

Effectively manages task variability and conflicts

03

Demonstrates improved policy generalization

Abstract

The purpose of offline multi-task reinforcement learning (MTRL) is to develop a unified policy applicable to diverse tasks without the need for online environmental interaction. Recent advancements approach this through sequence modeling, leveraging the Transformer architecture's scalability and the benefits of parameter sharing to exploit task similarities. However, variations in task content and complexity pose significant challenges in policy formulation, necessitating judicious parameter sharing and management of conflicting gradients for optimal policy performance. In this work, we introduce the Harmony Multi-Task Decision Transformer (HarmoDT), a novel solution designed to identify an optimal harmony subspace of parameters for each task. We approach this as a bi-level optimization problem, employing a meta-learning framework that leverages gradient-based techniques. The upper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

charleshsc/HarmoDT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsLinear Layer · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections