PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement Learning
Hangyu Mao, Rui Zhao, Ziyue Li, Zhiwei Xu, Hao Chen, Yiqun Chen, Bin, Zhang, Zhen Xiao, Junge Zhang, and Jiangjin Yin

TL;DR
PDiT introduces a novel interleaving transformer architecture for deep reinforcement learning that enhances perception and decision-making processes, leading to superior performance and explainability across various environments.
Contribution
The paper proposes the PDiT network, interleaving perception and decision-making transformers, a versatile architecture applicable to multiple deep RL settings with improved results.
Findings
Achieves superior performance over strong baselines.
Extracts explainable feature representations.
Applicable to online and offline RL with various observation types.
Abstract
Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work studies the former. Specifically, the Perception and Decision-making Interleaving Transformer (PDiT) network is proposed, which cascades two Transformers in a very natural way: the perceiving one focuses on \emph{the environmental perception} by processing the observation at the patch level, whereas the deciding one pays attention to \emph{the decision-making} by conditioning on the history of the desired returns, the perceiver's outputs, and the actions. Such a network design is generally applicable to a lot of deep RL settings, e.g., both the online and offline RL algorithms under environments with either image observations, proprioception observations, or hybrid image-language observations. Extensive experiments show that PDiT can not only achieve superior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Dropout · Layer Normalization · Residual Connection · Byte Pair Encoding
