Boosting Soft Q-Learning by Bounding
Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, Rahul V. Kulkarni

TL;DR
This paper introduces a novel method for enhancing soft Q-learning by deriving double-sided bounds on the optimal value function, which improves training efficiency and performance in reinforcement learning tasks.
Contribution
The paper presents a new bounding framework for soft Q-learning that enables better value function estimates and introduces an alternative Q-function update method.
Findings
Bounds improve training efficiency
Alternative Q-update boosts performance
Validated experimentally on RL tasks
Abstract
An agent's ability to leverage past experience is critical for efficiently solving new tasks. Prior work has focused on using value function estimates to obtain zero-shot approximations for solutions to a new task. In soft Q-learning, we show how any value function estimate can also be used to derive double-sided bounds on the optimal value function. The derived bounds lead to new approaches for boosting training performance which we validate experimentally. Notably, we find that the proposed framework suggests an alternative method for updating the Q-function, leading to boosted performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Face and Expression Recognition
