Algorithm-Relative Trajectory Valuation in Policy Gradient Control

Shihao Li; Jiachen Li; Jiamin Xu; Christopher Martin; Wei Li; Dongmei Chen

arXiv:2511.07878·cs.LG·November 12, 2025

Algorithm-Relative Trajectory Valuation in Policy Gradient Control

Shihao Li, Jiachen Li, Jiamin Xu, Christopher Martin, Wei Li, Dongmei Chen

PDF

Open Access

TL;DR

This paper investigates how the value of trajectories in policy-gradient control varies with the learning algorithm, revealing that trajectory valuation is highly dependent on algorithmic factors like variance and stabilization methods.

Contribution

It introduces a variance-mediated mechanism explaining how stabilization techniques affect trajectory value in policy-gradient methods, highlighting the algorithm-relative nature of trajectory valuation.

Findings

01

Higher PE reduces gradient variance in fixed energy settings.

02

Stabilization neutralizes variance effects, flipping trajectory value correlation.

03

Shapley scores help identify toxic subsets and complement pruning methods.

Abstract

We study how trajectory value depends on the learning algorithm in policy-gradient control. Using Trajectory Shapley in an uncertain LQR, we find a negative correlation between Persistence of Excitation (PE) and marginal value under vanilla REINFORCE ( $r \approx - 0.38$ ). We prove a variance-mediated mechanism: (i) for fixed energy, higher PE yields lower gradient variance; (ii) near saddles, higher variance increases escape probability, raising marginal contribution. When stabilized (state whitening or Fisher preconditioning), this variance channel is neutralized and information content dominates, flipping the correlation positive ( $r \approx + 0.29$ ). Hence, trajectory value is algorithm-relative. Experiments validate the mechanism and show decision-aligned scores (Leave-One-Out) complement Shapley for pruning, while Shapley identifies toxic subsets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Adaptive Dynamic Programming Control