Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success
Daniel Russo

TL;DR
This paper proves that success conditioning in policy improvement is equivalent to solving a trust-region optimization problem constrained by a data-determined divergence, ensuring conservative updates and providing insights into its effectiveness and limitations.
Contribution
It formally characterizes success conditioning as an exact trust-region optimization, clarifying its theoretical properties and implications for policy improvement.
Findings
Success conditioning maximizes policy improvement within a $ ext{chi}^2$ divergence constraint.
It guarantees non-degradation of performance and detects failure through minimal policy change.
Return thresholding can enhance improvement but may misalign with true objectives.
Abstract
A widely used technique for improving policies is success conditioning, in which one collects trajectories, identifies those that achieve a desired outcome, and updates the policy to imitate the actions taken along successful trajectories. This principle appears under many names -- rejection sampling with SFT, goal-conditioned RL, Decision Transformers -- yet what optimization problem it solves, if any, has remained unclear. We prove that success conditioning exactly solves a trust-region optimization problem, maximizing policy improvement subject to a divergence constraint whose radius is determined automatically by the data. This yields an identity: relative policy improvement, the magnitude of policy change, and a quantity we call action-influence -- measuring how random variation in action choices affects success rates -- are exactly equal at every state. Success…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Causal Inference Techniques · Game Theory and Applications
