Harnessing Bounded-Support Evolution Strategies for Policy Refinement

Ethan Hirschowitz; Fabio Ramos

arXiv:2511.09923·cs.LG·November 17, 2025

Harnessing Bounded-Support Evolution Strategies for Policy Refinement

Ethan Hirschowitz, Fabio Ramos

PDF

Open Access

TL;DR

This paper introduces Triangular-Distribution ES, a novel policy refinement method using bounded triangular noise and rank-based estimators, significantly improving robotic manipulation success rates and stability after initial PPO training.

Contribution

It presents a new bounded, antithetic triangular perturbation approach for evolution strategies, enhancing policy refinement in robotics with stable, parallelizable, and compute-efficient updates.

Findings

01

Raises success rates by 26.5% over PPO

02

Reduces variance in policy updates

03

Enables robust late-stage policy refinement

Abstract

Improving competent robot policies with on-policy RL is often hampered by noisy, low-signal gradients. We revisit Evolution Strategies (ES) as a policy-gradient proxy and localize exploration with bounded, antithetic triangular perturbations, suitable for policy refinement. We propose Triangular-Distribution ES (TD-ES) which pairs bounded triangular noise with a centered-rank finite-difference estimator to deliver stable, parallelizable, gradient-free updates. In a two-stage pipeline - PPO pretraining followed by TD-ES refinement - this preserves early sample efficiency while enabling robust late-stage gains. Across a suite of robotic manipulation tasks, TD-ES raises success rates by 26.5% relative to PPO and greatly reduces variance, offering a simple, compute-light path to reliable refinement.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning