Off-Policy Actor-Critic with Sigmoid-Bounded Entropy for Real-World Robot Learning
Xiefeng Wu, Mingyu Hu, Shu Zhang

TL;DR
This paper introduces SigEnt-SAC, a low-cost off-policy RL method with a sigmoid-bounded entropy term, enabling real-world robot learning from minimal data and demonstrating success in real robotic tasks.
Contribution
The paper proposes SigEnt-SAC, a novel off-policy actor-critic algorithm with a sigmoid-bounded entropy, improving stability and efficiency in real-world robot learning from limited data.
Findings
SigEnt-SAC reduces Q-function oscillations.
Achieves 100% success rate faster than baselines.
Learns successful policies with minimal real-world interactions.
Abstract
Deploying reinforcement learning in the real world remains challenging due to sample inefficiency, sparse rewards, and noisy visual observations. Prior work leverages demonstrations and human feedback to improve learning efficiency and robustness. However, offline-to-online methods need large datasets and can be unstable, while VLA-assisted RL relies on large-scale pretraining and fine-tuning. As a result, a low-cost real-world RL method with minimal data requirements has yet to emerge. We introduce \textbf{SigEnt-SAC}, an off-policy actor-critic method that learns from scratch using a single expert trajectory. Our key design is a sigmoid-bounded entropy term that prevents negative-entropy-driven optimization toward out-of-distribution actions and reduces Q-function oscillations. We benchmark SigEnt-SAC on D4RL tasks against representative baselines. Experiments show that SigEnt-SAC…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning
