Reward-Punishment Reinforcement Learning with Maximum Entropy

Jiexin Wang; Eiji Uchibe

arXiv:2405.11784·cs.LG·September 16, 2024·1 cites

Reward-Punishment Reinforcement Learning with Maximum Entropy

Jiexin Wang, Eiji Uchibe

PDF

Open Access

TL;DR

This paper presents softDMP, a reinforcement learning algorithm that enhances reward-punishment learning by integrating entropy optimization, leading to improved sample efficiency and robustness in navigation tasks.

Contribution

The paper introduces softDMP, a novel method that smooths traditional operators in reward-punishment RL and addresses data collection issues for better learning stability.

Findings

01

SoftDMP improves sample efficiency in discrete MDPs.

02

The probabilistic classifier effectively separates roll-outs for reward and punishment updates.

03

Superior performance demonstrated in Turtlebot 3 maze navigation tasks.

Abstract

We introduce the ``soft Deep MaxPain'' (softDMP) algorithm, which integrates the optimization of long-term policy entropy into reward-punishment reinforcement learning objectives. Our motivation is to facilitate a smoother variation of operators utilized in the updating of action values beyond traditional ``max'' and ``min'' operators, where the goal is enhancing sample efficiency and robustness. We also address two unresolved issues from the previous Deep MaxPain method. Firstly, we investigate how the negated (``flipped'') pain-seeking sub-policy, derived from the punishment action value, collaborates with the ``min'' operator to effectively learn the punishment module and how softDMP's smooth learning operator provides insights into the ``flipping'' trick. Secondly, we tackle the challenge of data collection for learning the punishment module to mitigate inconsistencies arising from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Neural and Behavioral Psychology Studies