Q-value Regularized Decision ConvFormer for Offline Reinforcement   Learning

Teng Yan; Zhendong Ruan; Yaobang Cai; Yu Han; Wenxian Li; Yang Zhang

arXiv:2409.08062·cs.LG·September 13, 2024

Q-value Regularized Decision ConvFormer for Offline Reinforcement Learning

Teng Yan, Zhendong Ruan, Yaobang Cai, Yu Han, Wenxian Li, Yang Zhang

PDF

Open Access

TL;DR

This paper introduces Q-value Regularized Decision ConvFormer (QDC), a novel offline RL model that combines trajectory modeling with value maximization, leading to improved performance and trajectory stitching on benchmarks.

Contribution

QDC integrates Decision ConvFormer with a Q-value regularization term, enhancing trajectory consistency and decision quality in offline reinforcement learning.

Findings

01

Outperforms existing methods on D4RL benchmarks.

02

Demonstrates superior trajectory stitching capabilities.

03

Achieves near-optimal performance across various environments.

Abstract

As a data-driven paradigm, offline reinforcement learning (Offline RL) has been formulated as sequence modeling, where the Decision Transformer (DT) has demonstrated exceptional capabilities. Unlike previous reinforcement learning methods that fit value functions or compute policy gradients, DT adjusts the autoregressive model based on the expected returns, past states, and actions, using a causally masked Transformer to output the optimal action. However, due to the inconsistency between the sampled returns within a single trajectory and the optimal returns across multiple trajectories, it is challenging to set an expected return to output the optimal action and stitch together suboptimal trajectories. Decision ConvFormer (DC) is easier to understand in the context of modeling RL trajectories within a Markov Decision Process compared to DT. We propose the Q-value Regularized Decision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Machine Learning and ELM

MethodsSparse Evolutionary Training · Byte Pair Encoding · Absolute Position Encodings · Softmax · Label Smoothing · Dropout · Layer Normalization · Attention Is All You Need · Position-Wise Feed-Forward Layer · Linear Layer