Graph Decision Transformer

Shengchao Hu; Li Shen; Ya Zhang; Dacheng Tao

arXiv:2303.03747·cs.LG·March 8, 2023·5 cites

Graph Decision Transformer

Shengchao Hu, Li Shen, Ya Zhang, Dacheng Tao

PDF

Open Access 4 Reviews

TL;DR

The paper introduces Graph Decision Transformer (GDT), an offline RL method that models sequences as causal graphs to better capture dependencies, improving performance on image-based tasks.

Contribution

GDT is the first offline RL approach to incorporate causal graph modeling with graph transformers, enhancing dependency learning and performance.

Findings

01

GDT matches or surpasses state-of-the-art offline RL methods.

02

GDT effectively models causal and temporal dependencies.

03

GDT performs well on image-based Atari and OpenAI Gym tasks.

Abstract

Offline reinforcement learning (RL) is a challenging task, whose objective is to learn policies from static trajectory data without interacting with the environment. Recently, offline RL has been viewed as a sequence modeling problem, where an agent generates a sequence of subsequent actions based on a set of static transition experiences. However, existing approaches that use transformers to attend to all tokens naively can overlook the dependencies between different tokens and limit long-term dependency learning. In this paper, we propose the Graph Decision Transformer (GDT), a novel offline RL approach that models the input sequence into a causal graph to capture potential dependencies between fundamentally different concepts and facilitate temporal and causal relationship learning. GDT uses a graph transformer to process the graph inputs with relation-enhanced mechanisms, and an…

Peer Reviews

Decision·ICLR 2024 Conference Withdrawn Submission

Reviewer 01Rating 3· reject, not good enoughConfidence 3

Strengths

The paper is well-structured and easy to follow. Experiments are conducted on two datasets and diverse tasks and consider taking advantage of visual inputs. The hyperparameters and training details are well documented.

Weaknesses

1. The intuition of the proposed graph representation is not clearly presented. Specifically, if the next state only depends on the current state and the action, which strictly follows the Markovian property, why would the current action be conditioned on both state and reward-to-go? Though the ablation on the graph representation (Fig. 3) is acknowledged, it does not strongly support the claim of the advantage of the current edge setting, considering the number of total possible settings. The a

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 5

Strengths

- The consideration of explicit dependencies between tokens are intuitive and meaningful in modeling RL sequences, because state, action, and reward have different meanings. Close related fields like autonomous driving also use this concept in modeling different agents and objects on the road. - The proposed method to modify query and key vectors in self-attention, under the consideration of RL-related dependencies, is somehow novel and interesting. - The exploration on effectiveness of diffe

Weaknesses

1. **Clearness of the proposed method.** The proposed method is interesting and somehow novel. However, Section 3.2 is not easy to follow. * I think there are some disconnections between Equations (1) (2) and actual attention mechanism, and how the proposed modification is applied to Transformer layers. Is the graph-based dependency considered every Transformer layer or is it only provided initially at the first layer? * Also, the $r_* \to r_*$ is learned should be clearly described.

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

1. This paper employs a relation-enhanced mechanism to account for temporal and causal relationships, potentially making the current decision transformer more robust and effective at capturing relations between states, rewards and rewards. 2. The path transformer is introduced to gather fine-grained spatial information for visual inputs, like the Atari environment.

Weaknesses

**Major Concerns:** 1. One of the significant concerns is the computational complexity of the suggested method. The model consists of various components, each adding layers of information, such as through input concatenation. This seems to result in information redundancy rather than improving the architectural design. 2. The empirical evidence presented does not sufficiently demonstrate the efficacy of the proposed method. For instance, in the Atari games results, CQL records best scores in 2

Reviewer 04Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

- The writing is easy to follow - The paper provides rich ablation studies

Weaknesses

- The effectiveness of the graph decision transformer is not so convincing. The performance of DT and GDT is quite close in D4RL (the gap is only about 3 points). Taking the performance variance into account, this is not a significant improvement. It looks like the main performance boost in GDT-plus comes from the Patch Transformer. This is not so cool as the causal relationship modeling is the main story of this project, and the results suggest that the proposed causal modeling method may not b

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Human Pose and Action Recognition · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Laplacian EigenMap · Linear Layer · Dropout · Layer Normalization · Laplacian Positional Encodings · Residual Connection · Multi-Head Attention · Byte Pair Encoding · Position-Wise Feed-Forward Layer