TL;DR
This paper introduces DFBT, a novel belief estimation method for RL with delays that directly forecasts states from observations, reducing errors and improving performance over existing recursive methods.
Contribution
The paper proposes DFBT, a belief forecasting transformer that directly predicts states, significantly reducing compounding errors and enhancing RL performance with delays.
Findings
DFBT reduces prediction errors in RL with delays.
DFBT outperforms SOTA methods on MuJoCo benchmarks.
DFBT improves learning efficiency through multi-step forecasting.
Abstract
Reinforcement learning (RL) with delays is challenging as sensory perceptions lag behind the actual events: the RL agent needs to estimate the real state of its environment based on past observations. State-of-the-art (SOTA) methods typically employ recursive, step-by-step forecasting of states. This can cause the accumulation of compounding errors. To tackle this problem, our novel belief estimation method, named Directly Forecasting Belief Transformer (DFBT), directly forecasts states from observations without incrementally estimating intermediate states step-by-step. We theoretically demonstrate that DFBT greatly reduces compounding errors of existing recursively forecasting methods, yielding stronger performance guarantees. In experiments with D4RL offline datasets, DFBT reduces compounding errors with remarkable prediction accuracy. DFBT's capability to forecast state sequences…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Residual Connection · Byte Pair Encoding
