Q-value Regularized Decision ConvFormer for Offline Reinforcement Learning
Teng Yan, Zhendong Ruan, Yaobang Cai, Yu Han, Wenxian Li, Yang Zhang

TL;DR
This paper introduces Q-value Regularized Decision ConvFormer (QDC), a novel offline RL model that combines trajectory modeling with value maximization, leading to improved performance and trajectory stitching on benchmarks.
Contribution
QDC integrates Decision ConvFormer with a Q-value regularization term, enhancing trajectory consistency and decision quality in offline reinforcement learning.
Findings
Outperforms existing methods on D4RL benchmarks.
Demonstrates superior trajectory stitching capabilities.
Achieves near-optimal performance across various environments.
Abstract
As a data-driven paradigm, offline reinforcement learning (Offline RL) has been formulated as sequence modeling, where the Decision Transformer (DT) has demonstrated exceptional capabilities. Unlike previous reinforcement learning methods that fit value functions or compute policy gradients, DT adjusts the autoregressive model based on the expected returns, past states, and actions, using a causally masked Transformer to output the optimal action. However, due to the inconsistency between the sampled returns within a single trajectory and the optimal returns across multiple trajectories, it is challenging to set an expected return to output the optimal action and stitch together suboptimal trajectories. Decision ConvFormer (DC) is easier to understand in the context of modeling RL trajectories within a Markov Decision Process compared to DT. We propose the Q-value Regularized Decision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and ELM
MethodsSparse Evolutionary Training · Byte Pair Encoding · Absolute Position Encodings · Softmax · Label Smoothing · Dropout · Layer Normalization · Attention Is All You Need · Position-Wise Feed-Forward Layer · Linear Layer
