Using Common Random Numbers for Simulation-based Planning with Rollouts
Sandarbh Yadav, Frederic J Maliakkal, Harshad Khadilkar, Shivaram Kalyanakrishnan

TL;DR
This paper investigates the use of common random numbers in simulation-based planning with rollouts, demonstrating variance reduction and improved task performance in stochastic decision-making tasks.
Contribution
It provides a simple, provably effective method for variance reduction in simulation-based planning using common random numbers beyond certain rollout depths.
Findings
Variance reduction improves decision quality in synthetic tasks.
Application to pension-disbursement planning shows enhanced performance.
Deployment in Ludo with UCT demonstrates practical benefits.
Abstract
Simulation-based planning with rollouts is a widely-deployed technique for decision making in stochastic environments. The primary instrument of simulation-based planning is a sampling model, which is repeatedly called to generate trajectories and estimate the utilities of available actions. Among the actions thus explored, one with the maximum estimated utility is then executed. In this paper, we examine the effect of using common random numbers in the simulation process. We obtain a simple recipe for (provably) reducing variance in relative utility when simulations invoke a rollout policy beyond some depth. Experiments on synthetic tasks confirm that our scheme improves task performance. The broader significance of our innovation is apparent from two practical applications: (1) single-step lookahead planning in a pension-disbursement task, and (2) a deployment of the well-known UCT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
