PDiT: Interleaving Perception and Decision-making Transformers for Deep   Reinforcement Learning

Hangyu Mao; Rui Zhao; Ziyue Li; Zhiwei Xu; Hao Chen; Yiqun Chen; Bin; Zhang; Zhen Xiao; Junge Zhang; and Jiangjin Yin

arXiv:2312.15863·cs.LG·December 27, 2023·1 cites

PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement Learning

Hangyu Mao, Rui Zhao, Ziyue Li, Zhiwei Xu, Hao Chen, Yiqun Chen, Bin, Zhang, Zhen Xiao, Junge Zhang, and Jiangjin Yin

PDF

Open Access 2 Repos

TL;DR

PDiT introduces a novel interleaving transformer architecture for deep reinforcement learning that enhances perception and decision-making processes, leading to superior performance and explainability across various environments.

Contribution

The paper proposes the PDiT network, interleaving perception and decision-making transformers, a versatile architecture applicable to multiple deep RL settings with improved results.

Findings

01

Achieves superior performance over strong baselines.

02

Extracts explainable feature representations.

03

Applicable to online and offline RL with various observation types.

Abstract

Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work studies the former. Specifically, the Perception and Decision-making Interleaving Transformer (PDiT) network is proposed, which cascades two Transformers in a very natural way: the perceiving one focuses on \emph{the environmental perception} by processing the observation at the patch level, whereas the deciding one pays attention to \emph{the decision-making} by conditioning on the history of the desired returns, the perceiver's outputs, and the actions. Such a network design is generally applicable to a lot of deep RL settings, e.g., both the online and offline RL algorithms under environments with either image observations, proprioception observations, or hybrid image-language observations. Extensive experiments show that PDiT can not only achieve superior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Dropout · Layer Normalization · Residual Connection · Byte Pair Encoding