Parameter Critic: a Model Free Variance Reduction Method Through Imperishable Samples
Juan Cervino, Harshat Kumar, Alejandro Ribeiro

TL;DR
This paper introduces the parameter critic, a novel reinforcement learning method that uses a function approximator to maintain sample validity across policy updates, improving sample efficiency and robustness.
Contribution
The paper proposes the parameter critic, a model-free approach that learns the relationship between policy parameters and expected rewards, enhancing sample reuse and stability.
Findings
Outperforms gradient-free exploration methods in convergence.
Successfully solves the cartpole problem with learned parameter-reward relationship.
Demonstrates robustness to noise through convergence analysis.
Abstract
We consider the problem of finding a policy that maximizes an expected reward throughout the trajectory of an agent that interacts with an unknown environment. Frequently denoted Reinforcement Learning, this framework suffers from the need of large amount of samples in each step of the learning process. To this end, we introduce parameter critic, a formulation that allows samples to keep their validity even when the parameters of the policy change. In particular, we propose the use of a function approximator to directly learn the relationship between the parameters and the expected cumulative reward. Through convergence analysis, we demonstrate the parameter critic outperforms gradient-free parameter space exploration techniques as it is robust to noise. Empirically, we show that our method solves the cartpole problem which corroborates our claim as the agent can successfully learn an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Domain Adaptation and Few-Shot Learning
