Approximating Martingale Process for Variance Reduction in Deep Reinforcement Learning with Large State Space
Charlie Ruan

TL;DR
This paper extends the Approximating Martingale Process (AMP) for variance reduction in reinforcement learning to large, uncertain state spaces, exemplified by ride-hailing systems, integrating it with Proximal Policy Optimization.
Contribution
It generalizes AMP for large, uncertain state spaces in RL and demonstrates its application in ride-hailing systems with PPO.
Findings
AMP effectively reduces variance in large state space RL scenarios.
Application of AMP with PPO improves policy optimization in ride-hailing systems.
Demonstrates feasibility of AMP in complex, real-world RL environments.
Abstract
Approximating Martingale Process (AMP) is proven to be effective for variance reduction in reinforcement learning (RL) in specific cases such as Multiclass Queueing Networks. However, in the already proven cases, the state space is relatively small and all possible state transitions can be iterated through. In this paper, we consider systems in which state space is large and have uncertainties when considering state transitions, thus making AMP a generalized variance-reduction method in RL. Specifically, we will investigate the application of AMP in ride-hailing systems like Uber, where Proximal Policy Optimization (PPO) is incorporated to optimize the policy of matching drivers and customers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsElectric Vehicles and Infrastructure · Transportation and Mobility Innovations · Age of Information Optimization
MethodsAdversarial Model Perturbation
