Task-Aware Harmony Multi-Task Decision Transformer for Offline   Reinforcement Learning

Ziqing Fan; Shengchao Hu; Yuhang Zhou; Li Shen; Ya Zhang; Yanfeng; Wang; Dacheng Tao

arXiv:2411.01146·cs.LG·November 5, 2024

Task-Aware Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning

Ziqing Fan, Shengchao Hu, Yuhang Zhou, Li Shen, Ya Zhang, Yanfeng, Wang, Dacheng Tao

PDF

Open Access 1 Repo

TL;DR

This paper introduces HarmoDT, a novel multi-task decision transformer that adaptively identifies optimal parameter subspaces for each task using bi-level optimization, improving offline reinforcement learning across diverse scenarios.

Contribution

HarmoDT employs a bi-level optimization framework to learn task-specific parameter masks, enabling effective multi-task learning without task identifiers, and introduces a group-wise variant for better generalization.

Findings

01

Achieved 8% improvement in task-provided settings

02

Attained 5% gain in task-agnostic scenarios

03

Realized 10% enhancement in unseen task settings

Abstract

The purpose of offline multi-task reinforcement learning (MTRL) is to develop a unified policy applicable to diverse tasks without the need for online environmental interaction. Recent advancements approach this through sequence modeling, leveraging the Transformer architecture's scalability and the benefits of parameter sharing to exploit task similarities. However, variations in task content and complexity pose significant challenges in policy formulation, necessitating judicious parameter sharing and management of conflicting gradients for optimal policy performance. Furthermore, identifying the optimal parameter subspace for each task often necessitates prior knowledge of the task identifier during inference, limiting applicability in real-world scenarios with variable task content and unknown current tasks. In this work, we introduce the Harmony Multi-Task Decision Transformer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

charleshsc/HarmoDT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Position-Wise Feed-Forward Layer · Adam · Multi-Head Attention · Residual Connection · Byte Pair Encoding · Dropout · Absolute Position Encodings