Decoupling regularization from the action space

Sobhan Mohammadpour; Emma Frejinger; Pierre-Luc Bacon

arXiv:2406.05953·cs.LG·January 10, 2025

Decoupling regularization from the action space

Sobhan Mohammadpour, Emma Frejinger, Pierre-Luc Bacon

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper highlights the issue of over-regularization in entropy-regularized reinforcement learning when action spaces vary, and proposes static and dynamic temperature solutions to maintain consistent regularization, improving performance across tasks.

Contribution

It introduces methods to decouple regularization from action space size in entropy-regularized RL, addressing a key limitation in state-dependent action spaces.

Findings

01

Improved performance on DeepMind control suite

02

Effective in static and dynamic temperature regimes

03

Enhanced biological sequence design results

Abstract

Regularized reinforcement learning (RL), particularly the entropy-regularized kind, has gained traction in optimal control and inverse RL. While standard unregularized RL methods remain unaffected by changes in the number of actions, we show that it can severely impact their regularized counterparts. This paper demonstrates the importance of decoupling the regularizer from the action space: that is, to maintain a consistent level of regularization regardless of how many actions are involved to avoid over-regularization. Whereas the problem can be avoided by introducing a task-specific temperature parameter, it is often undesirable and cannot solve the problem when action spaces are state-dependent. In the state-dependent action context, different states with varying action spaces are regularized inconsistently. We introduce two solutions: a static temperature selection approach and a…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

1. The paper motivates the problem with standard regularization techniques well in Section 2-5. 2. The empirical results are quite compelling. The proposed method, while simple, succeeds in various DM env tasks where standard methods fail. More convincingly, the proposed method allows SQL to succeed on a drug discovery task, whereas prior attempts were known to be too unstable.

Weaknesses

1. The method proposed in the paper is very simple, and involves normalizing the standard temperature by the range the regularization objective can take (e.g. minimum possible entropy). While I do not think that the simplicity of the approach should detract from the novelty of the paper, I am not convinced that the better experimental results are actually due to the proposed range-normalization, and not simply that the regularization can now be state-dependent. To my knowledge, state-dependent t

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

As the authors note, regularized RL is widely applicable in both control and IRL, so improving its sensitivity to hyperparameters could be helpful for a variety of applications. The experimental results show that the proposed method for setting the regularization coefficient seems to work quite well in practice, particularly when the action space does not have a standard scale.

Weaknesses

While the paper is promising, I worry that in it's current form it is not ready for publication at a top conference. First, the contribution is relatively small, since it is already known how to choose the entropy coefficient for many environments (e.g., via the SAC rule of using $\bar{H}=-n$) and it often needs to be tuned regardless depending on the scale of the reward function. While I still think the ideas here are useful, the writing, theory, and experiments should be of very high quality t

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

- The paper is overall well organized and written. Illustrative examples and discussions are used for smooth reading. - The proposed solutions to decoupling regularization and temperature adjustment are of generality to a family of commonly seen regularizations. - The experiments cover toy examples, popular DMC environments and drug design problem. Noticeably, unprecedented results in the domain of drug design are achieved.

Weaknesses

- I think the key hyperparameter $\alpha$ needs more discussion. It will be helpful to analyze the (empirical) effects of different choices of the value of $\alpha$ and recommend the values or the strategy of value selection. - The content in the experiment part lacks of sufficient details, which is also missing in the appendix. I recommend the authors to add the key details of experiments.

Code & Models

Repositories

SobhanMP/decoupled-soft-RL
jaxOfficial

Videos

Decoupling regularization from the action space· slideslive

Taxonomy

TopicsMedical Image Segmentation Techniques · Advanced Numerical Analysis Techniques