Multi-Objective Decision Transformers for Offline Reinforcement Learning
Abdelghani Ghanem, Philippe Ciblat, Mounir Ghogho

TL;DR
This paper reformulates offline reinforcement learning as a multi-objective sequence modeling task using transformers, improving attention mechanisms and trajectory representations to enhance policy performance on benchmark tasks.
Contribution
It introduces a multi-objective approach and action space regions to better utilize transformer attention and address trajectory representation issues in offline RL.
Findings
Improved transformer attention utilization in offline RL.
Enhanced performance on D4RL locomotion benchmarks.
Outperforms or matches state-of-the-art methods.
Abstract
Offline Reinforcement Learning (RL) is structured to derive policies from static trajectory data without requiring real-time environment interactions. Recent studies have shown the feasibility of framing offline RL as a sequence modeling task, where the sole aim is to predict actions based on prior context using the transformer architecture. However, the limitation of this single task learning approach is its potential to undermine the transformer model's attention mechanism, which should ideally allocate varying attention weights across different tokens in the input context for optimal prediction. To address this, we reformulate offline RL as a multi-objective optimization problem, where the prediction is extended to states and returns. We also highlight a potential flaw in the trajectory representation used for sequence modeling, which could generate inaccuracies when modeling the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
