Improving Policy Optimization via $\varepsilon$-Retrain

Luca Marzari; Priya L. Donti; Changliu Liu; Enrico Marchesini

arXiv:2406.08315·cs.AI·April 15, 2025

Improving Policy Optimization via $\varepsilon$-Retrain

Luca Marzari, Priya L. Donti, Changliu Liu, Enrico Marchesini

PDF

Open Access

TL;DR

This paper introduces $\varepsilon$-retrain, a novel exploration strategy that enhances policy optimization by focusing on retraining in areas where behavioral preferences are violated, leading to improved performance and sample efficiency.

Contribution

The paper proposes an iterative retraining method with formal verification to ensure adherence to behavioral preferences during policy optimization.

Findings

01

Significant performance improvements across multiple tasks.

02

Enhanced sample efficiency demonstrated in experiments.

03

Formal guarantees of behavioral adherence.

Abstract

We present $ε$ -retrain, an exploration strategy encouraging a behavioral preference while optimizing policies with monotonic improvement guarantees. To this end, we introduce an iterative procedure for collecting retrain areas -- parts of the state space where an agent did not satisfy the behavioral preference. Our method switches between the typical uniform restart state distribution and the retrain areas using a decaying factor $ε$ , allowing agents to retrain on situations where they violated the preference. We also employ formal verification of neural networks to provably quantify the degree to which agents adhere to these behavioral preferences. Experiments over hundreds of seeds across locomotion, power network, and navigation tasks show that our method yields agents that exhibit significant performance and sample efficiency improvements.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Complexity and Algorithms in Graphs · Formal Methods in Verification