Harnessing Bounded-Support Evolution Strategies for Policy Refinement
Ethan Hirschowitz, Fabio Ramos

TL;DR
This paper introduces Triangular-Distribution ES, a novel policy refinement method using bounded triangular noise and rank-based estimators, significantly improving robotic manipulation success rates and stability after initial PPO training.
Contribution
It presents a new bounded, antithetic triangular perturbation approach for evolution strategies, enhancing policy refinement in robotics with stable, parallelizable, and compute-efficient updates.
Findings
Raises success rates by 26.5% over PPO
Reduces variance in policy updates
Enables robust late-stage policy refinement
Abstract
Improving competent robot policies with on-policy RL is often hampered by noisy, low-signal gradients. We revisit Evolution Strategies (ES) as a policy-gradient proxy and localize exploration with bounded, antithetic triangular perturbations, suitable for policy refinement. We propose Triangular-Distribution ES (TD-ES) which pairs bounded triangular noise with a centered-rank finite-difference estimator to deliver stable, parallelizable, gradient-free updates. In a two-stage pipeline - PPO pretraining followed by TD-ES refinement - this preserves early sample efficiency while enabling robust late-stage gains. Across a suite of robotic manipulation tasks, TD-ES raises success rates by 26.5% relative to PPO and greatly reduces variance, offering a simple, compute-light path to reliable refinement.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning
