Accelerating Monte-Carlo Tree Search on CPU-FPGA Heterogeneous Platform
Yuan Meng, Rajgopal Kannan, Viktor Prasanna

TL;DR
This paper presents a scalable CPU-FPGA system that accelerates Monte Carlo Tree Search by optimizing in-tree operations, achieving significant speedups and improved scalability over existing CPU-based methods.
Contribution
It introduces a novel decomposition of MCTS for CPU-FPGA platforms and hardware optimizations that substantially enhance performance and scalability.
Findings
Up to 35x speedup in in-tree operations.
3x higher overall system throughput.
Superior scalability compared to state-of-the-art CPU implementations.
Abstract
Monte Carlo Tree Search (MCTS) methods have achieved great success in many Artificial Intelligence (AI) benchmarks. The in-tree operations become a critical performance bottleneck in realizing parallel MCTS on CPUs. In this work, we develop a scalable CPU-FPGA system for Tree-Parallel MCTS. We propose a novel decomposition and mapping of MCTS data structure and computation onto CPU and FPGA to reduce communication and coordination. High scalability of our system is achieved by encapsulating in-tree operations in an SRAM-based FPGA accelerator. To lower the high data access latency and inter-worker synchronization overheads, we develop several hardware optimizations. We show that by using our accelerator, we obtain up to speedup for in-tree operations, and higher overall system throughput. Our CPU-FPGA system also achieves superior scalability wrt number of parallel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Parallel Computing and Optimization Techniques · Algorithms and Data Compression
