Reliable Policy Iteration: Performance Robustness Across Architecture and Environment Perturbations
S.R. Eshwar, Aniruddha Mukherjee, Kintan Saha, Krishna Agarwal, Gugan Thoppe, Aditya Gopalan, Gal Dalal

TL;DR
This paper evaluates the robustness of Reliable Policy Iteration (RPI) in classical control tasks, demonstrating its consistent high performance and stability compared to other deep reinforcement learning algorithms under various perturbations.
Contribution
The paper provides empirical evidence that RPI maintains performance robustness across different neural network and environment perturbations, highlighting its potential as a reliable RL method.
Findings
RPI reaches near-optimal performance early in training.
RPI sustains high performance despite perturbations.
RPI outperforms DQN, Double DQN, DDPG, TD3, and PPO in robustness.
Abstract
In a recent work, we proposed Reliable Policy Iteration (RPI), that restores policy iteration's monotonicity-of-value-estimates property to the function approximation setting. Here, we assess the robustness of RPI's empirical performance on two classical control tasks -- CartPole and Inverted Pendulum -- under changes to neural network and environmental parameters. Relative to DQN, Double DQN, DDPG, TD3, and PPO, RPI reaches near-optimal performance early and sustains this policy as training proceeds. Because deep RL methods are often hampered by sample inefficiency, training instability, and hyperparameter sensitivity, our results highlight RPI's promise as a more reliable alternative.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Model Reduction and Neural Networks
