Combating the Compounding-Error Problem with a Multi-step Model
Kavosh Asadi, Dipendra Misra, Seungchan Kim, Michel L. Littman

TL;DR
This paper introduces a multi-step model in model-based reinforcement learning to directly predict outcomes of action sequences, reducing error accumulation and improving planning accuracy.
Contribution
It proposes a novel multi-step model that directly predicts multi-action outcomes, addressing the compounding-error problem in traditional one-step models.
Findings
Multi-step models improve value-function estimation.
Multi-step models lead to better action selection.
Theoretical and empirical evidence supports multi-step approach.
Abstract
Model-based reinforcement learning is an appealing framework for creating agents that learn, plan, and act in sequential environments. Model-based algorithms typically involve learning a transition model that takes a state and an action and outputs the next state---a one-step model. This model can be composed with itself to enable predicting multiple steps into the future, but one-step prediction errors can get magnified, leading to unacceptable inaccuracy. This compounding-error problem plagues planning and undermines model-based reinforcement learning. In this paper, we address the compounding-error problem by introducing a multi-step model that directly outputs the outcome of executing a sequence of actions. Novel theoretical and empirical results indicate that the multi-step model is more conducive to efficient value-function estimation, and it yields better action selection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Reliability and Analysis Research · Risk and Safety Analysis · Machine Learning and Algorithms
