Reinforcement Learning with a Focus on Adjusting Policies to Reach Targets
Akane Tsuboya, Yu Kono, Tatsuji Takahashi

TL;DR
This paper introduces a deep reinforcement learning approach that emphasizes reaching predefined targets over reward maximization, enabling more adaptable exploration and better performance in dynamic environments.
Contribution
The proposed method prioritizes target achievement and adaptively adjusts exploration, improving efficiency and adaptability in non-stationary settings.
Findings
Achieved comparable or superior returns to standard methods.
Flexibly adjusts exploration scope based on target achievement.
Potential to adapt to non-stationary environments.
Abstract
The objective of a reinforcement learning agent is to discover better actions through exploration. However, typical exploration techniques aim to maximize rewards, often incurring high costs in both exploration and learning processes. We propose a novel deep reinforcement learning method, which prioritizes achieving an aspiration level over maximizing expected return. This method flexibly adjusts the degree of exploration based on the proportion of target achievement. Through experiments on a motion control task and a navigation task, this method achieved returns equal to or greater than other standard methods. The results of the analysis showed two things: our method flexibly adjusts the exploration scope, and it has the potential to enable the agent to adapt to non-stationary environments. These findings indicated that this method may have effectiveness in improving exploration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSupply Chain and Inventory Management · Innovation Diffusion and Forecasting
