Learning Without Time-Based Embodiment Resets in Soft-Actor Critic
Homayoon Farrahi, A. Rupam Mahmood

TL;DR
This paper develops a continuing version of the Soft Actor-Critic algorithm to enable reinforcement learning without environment resets, addressing exploration challenges and improving long-term performance in robotic tasks.
Contribution
It introduces a modified SAC algorithm for learning without episode resets and demonstrates techniques to recover performance when resets are not used.
Findings
Continuing SAC performs as well or better with simple reward modifications.
Embodiment resets aid exploration and improve learning speed.
Increasing policy entropy can recover performance without resets.
Abstract
When creating new reinforcement learning tasks, practitioners often accelerate the learning process by incorporating into the task several accessory components, such as breaking the environment interaction into independent episodes and frequently resetting the environment. Although they can enable the learning of complex intelligent behaviors, such task accessories can result in unnatural task setups and hinder long-term performance in the real world. In this work, we explore the challenges of learning without episode terminations and robot embodiment resets using the Soft Actor-Critic (SAC) algorithm. To learn without terminations, we present a continuing version of the SAC algorithm and show that, with simple modifications to the reward functions of existing tasks, continuing SAC can perform as well as or better than episodic SAC while reducing the sensitivity of performance to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Embodied and Extended Cognition
