Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success

Daniel Russo

arXiv:2601.18175·cs.AI·January 27, 2026

Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success

Daniel Russo

PDF

Open Access

TL;DR

This paper proves that success conditioning in policy improvement is equivalent to solving a trust-region optimization problem constrained by a data-determined divergence, ensuring conservative updates and providing insights into its effectiveness and limitations.

Contribution

It formally characterizes success conditioning as an exact trust-region optimization, clarifying its theoretical properties and implications for policy improvement.

Findings

01

Success conditioning maximizes policy improvement within a $ ext{chi}^2$ divergence constraint.

02

It guarantees non-degradation of performance and detects failure through minimal policy change.

03

Return thresholding can enhance improvement but may misalign with true objectives.

Abstract

A widely used technique for improving policies is success conditioning, in which one collects trajectories, identifies those that achieve a desired outcome, and updates the policy to imitate the actions taken along successful trajectories. This principle appears under many names -- rejection sampling with SFT, goal-conditioned RL, Decision Transformers -- yet what optimization problem it solves, if any, has remained unclear. We prove that success conditioning exactly solves a trust-region optimization problem, maximizing policy improvement subject to a $χ^{2}$ divergence constraint whose radius is determined automatically by the data. This yields an identity: relative policy improvement, the magnitude of policy change, and a quantity we call action-influence -- measuring how random variation in action choices affects success rates -- are exactly equal at every state. Success…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Causal Inference Techniques · Game Theory and Applications