QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL
Xing Lei, Jincheng Wang, Xuetao Zhang, Donglin Wang

TL;DR
QHyer is a novel offline goal-conditioned RL method that combines a state-conditioned Q-estimator with a hybrid attention-mamba backbone to handle long-term dependencies and sparse rewards effectively.
Contribution
It introduces a flow-parameterized Q-estimator for better demonstration stitching and a gated hybrid attention-mamba architecture for adaptive history compression.
Findings
Achieves state-of-the-art results on non-Markovian datasets.
Effectively handles long-range dependencies and sparse rewards.
Validates versatility across diverse offline GCRL scenarios.
Abstract
Offline goal-conditioned RL (GCRL) learns goal-reaching policies from static datasets, but real-world datasets are often partially observable and history-dependent, exhibiting a mix of Markovian and non-Markovian that violate standard RL assumptions. History-aware sequence models such as Decision Transformer (DT) are a natural fit for long-term dependency modeling, yet pure attention is inefficient and brittle when handling local Markovian structure and long-range context simultaneously. Although recent hybrid architectures (e.g., LSDT) introduce local extractors to improve local dependencies modeling, the fixed-window extraction cannot adapt its effective memory to varying dependency lengths in temporally heterogeneous settings, often truncating long-range context rather than compressing its content adaptively. Moreover, sequential offline GCRL faces a key bottleneck: under sparse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
