Episodic Reinforcement Learning with Expanded State-reward Space
Dayang Liang, Yaru Zhang, Yunlong Liu

TL;DR
This paper proposes an enhanced episodic control framework for deep reinforcement learning that incorporates historical states and returns into the state-reward space, improving sample efficiency and value estimation.
Contribution
It introduces a novel EC-based DRL method with expanded state-reward space, utilizing retrieved past states and returns to improve policy performance and reduce Q-value overestimation.
Findings
Outperforms recent methods and baselines on Box2d and Mujoco tasks.
Improves sample efficiency and value estimation accuracy.
Alleviates Q-value overestimation in DRL.
Abstract
Empowered by deep neural networks, deep reinforcement learning (DRL) has demonstrated tremendous empirical successes in various domains, including games, health care, and autonomous driving. Despite these advancements, DRL is still identified as data-inefficient as effective policies demand vast numbers of environmental samples. Recently, episodic control (EC)-based model-free DRL methods enable sample efficiency by recalling past experiences from episodic memory. However, existing EC-based methods suffer from the limitation of potential misalignment between the state and reward spaces for neglecting the utilization of (past) retrieval states with extensive information, which probably causes inaccurate value estimation and degraded policy performance. To tackle this issue, we introduce an efficient EC-based DRL framework with expanded state-reward space, where the expanded states used…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Mental Health Interventions
