Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning
Giseung Park, Sungho Choi, Youngchul Sung

TL;DR
This paper introduces a blockwise sequential learning architecture using self-attention for partially observable reinforcement learning, effectively capturing long-term dependencies without complex data reconstruction.
Contribution
The paper presents a novel blockwise sequential model with self-attention for better handling partial observability in reinforcement learning, improving over traditional RNN-based methods.
Findings
Significantly outperforms previous methods in various environments.
Efficient gradient estimation using self-normalized importance sampling.
Capable of detailed sequential learning in partial observable settings.
Abstract
This paper proposes a new sequential model learning architecture to solve partially observable Markov decision problems. Rather than compressing sequential information at every timestep as in conventional recurrent neural network-based methods, the proposed architecture generates a latent variable in each data block with a length of multiple timesteps and passes the most relevant information to the next block for policy optimization. The proposed blockwise sequential model is implemented based on self-attention, making the model capable of detailed sequential learning in partial observable settings. The proposed model builds an additional learning network to efficiently implement gradient estimation by using self-normalized importance sampling, which does not require the complex blockwise input data reconstruction in the model learning. Numerical results show that the proposed method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Fault Detection and Control Systems · Gaussian Processes and Bayesian Inference
