TL;DR
DeepSynth introduces an automata synthesis method to automatically uncover task structure, significantly improving deep RL training efficiency in environments with sparse, non-Markovian rewards.
Contribution
It presents a novel automata synthesis algorithm that guides deep RL with interpretable high-level task structures, enhancing learning in complex environments.
Findings
Reduces policy training iterations by two orders of magnitude.
Improves scalability in high-dimensional, sparse reward settings.
Successfully applied to Atari's Montezuma's Revenge.
Abstract
This paper proposes DeepSynth, a method for effective training of deep Reinforcement Learning (RL) agents when the reward is sparse and non-Markovian, but at the same time progress towards the reward requires achieving an unknown sequence of high-level objectives. Our method employs a novel algorithm for synthesis of compact automata to uncover this sequential structure automatically. We synthesise a human-interpretable automaton from trace data collected by exploring the environment. The state space of the environment is then enriched with the synthesised automaton so that the generation of a control policy by deep RL is guided by the discovered structure encoded in the automaton. The proposed approach is able to cope with both high-dimensional, low-level features and unknown sparse non-Markovian rewards. We have evaluated DeepSynth's performance in a set of experiments that includes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
