Approximating Martingale Process for Variance Reduction in Deep   Reinforcement Learning with Large State Space

Charlie Ruan

arXiv:2211.15886·cs.LG·November 30, 2022

Approximating Martingale Process for Variance Reduction in Deep Reinforcement Learning with Large State Space

Charlie Ruan

PDF

Open Access

TL;DR

This paper extends the Approximating Martingale Process (AMP) for variance reduction in reinforcement learning to large, uncertain state spaces, exemplified by ride-hailing systems, integrating it with Proximal Policy Optimization.

Contribution

It generalizes AMP for large, uncertain state spaces in RL and demonstrates its application in ride-hailing systems with PPO.

Findings

01

AMP effectively reduces variance in large state space RL scenarios.

02

Application of AMP with PPO improves policy optimization in ride-hailing systems.

03

Demonstrates feasibility of AMP in complex, real-world RL environments.

Abstract

Approximating Martingale Process (AMP) is proven to be effective for variance reduction in reinforcement learning (RL) in specific cases such as Multiclass Queueing Networks. However, in the already proven cases, the state space is relatively small and all possible state transitions can be iterated through. In this paper, we consider systems in which state space is large and have uncertainties when considering state transitions, thus making AMP a generalized variance-reduction method in RL. Specifically, we will investigate the application of AMP in ride-hailing systems like Uber, where Proximal Policy Optimization (PPO) is incorporated to optimize the policy of matching drivers and customers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElectric Vehicles and Infrastructure · Transportation and Mobility Innovations · Age of Information Optimization

MethodsAdversarial Model Perturbation