Boosting Soft Q-Learning by Bounding

Jacob Adamczyk; Volodymyr Makarenko; Stas Tiomkin; Rahul V. Kulkarni

arXiv:2406.18033·cs.LG·June 27, 2024

Boosting Soft Q-Learning by Bounding

Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, Rahul V. Kulkarni

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel method for enhancing soft Q-learning by deriving double-sided bounds on the optimal value function, which improves training efficiency and performance in reinforcement learning tasks.

Contribution

The paper presents a new bounding framework for soft Q-learning that enables better value function estimates and introduces an alternative Q-function update method.

Findings

01

Bounds improve training efficiency

02

Alternative Q-update boosts performance

03

Validated experimentally on RL tasks

Abstract

An agent's ability to leverage past experience is critical for efficiently solving new tasks. Prior work has focused on using value function estimates to obtain zero-shot approximations for solutions to a new task. In soft Q-learning, we show how any value function estimate can also be used to derive double-sided bounds on the optimal value function. The derived bounds lead to new approaches for boosting training performance which we validate experimentally. Notably, we find that the proposed framework suggests an alternative method for updating the Q-function, leading to boosted performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jacobha/rlc-softqbounding
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Face and Expression Recognition