Enhancing reinforcement learning by a finite reward response filter with a case study in intelligent structural control
Hamid Radmard Rahmani, Carsten Koenke, Marco A. Wiering

TL;DR
This paper introduces an enhanced Q-learning method with a reflexive response filter to address delays in reward effects, demonstrating improved performance in earthquake-resistant structural control tasks.
Contribution
The paper proposes a novel reflexive gamma-function to improve reinforcement learning in delayed reward scenarios, specifically applied to structural earthquake control.
Findings
Enhanced method outperforms standard Q-learning in all delay scenarios.
Significantly improves stability in delayed reward environments.
Effective in reducing structural vibrations during earthquakes.
Abstract
In many reinforcement learning (RL) problems, it takes some time until a taken action by the agent reaches its maximum effect on the environment and consequently the agent receives the reward corresponding to that action by a delay called action-effect delay. Such delays reduce the performance of the learning algorithm and increase the computational costs, as the reinforcement learning agent values the immediate rewards more than the future reward that is more related to the taken action. This paper addresses this issue by introducing an applicable enhanced Q-learning method in which at the beginning of the learning phase, the agent takes a single action and builds a function that reflects the environments response to that action, called the reflexive - function. During the training phase, the agent utilizes the created reflexive - function to update the Q-values. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsElevator Systems and Control · Adaptive Dynamic Programming Control · Traffic control and management
MethodsQ-Learning
