Parallelizing a modern GPU simulator
Rodrigo Huerta, Antonio Gonz\'alez

TL;DR
This paper introduces a simple, deterministic parallelization method for a GPU simulator using OpenMP, significantly reducing simulation time without sacrificing accuracy, thus enabling faster and larger-scale architecture research.
Contribution
The paper presents a minimal-code-change parallelization approach for GPU simulation that achieves high speed-ups while maintaining deterministic and accurate results.
Findings
Average speed-up of 5.8x with 16 threads
Achieves up to 14x speed-up on some workloads
Reduces simulation time from over five days to less than 12 hours
Abstract
Simulators are a primary tool in computer architecture research but are extremely computationally intensive. Simulating modern architectures with increased core counts and recent workloads can be challenging, even on modern hardware. This paper demonstrates that simulating some GPGPU workloads in a single-threaded state-of-the-art simulator such as Accel-sim can take more than five days. In this paper we present a simple approach to parallelize this simulator with minimal code changes by using OpenMP. Moreover, our parallelization technique is deterministic, so the simulator provides the same results for single-threaded and multi-threaded simulations. Compared to previous works, we achieve a higher speed-up, and, more importantly, the parallel simulation does not incur any inaccuracies. When we run the simulator with 16 threads, we achieve an average speed-up of 5.8x and reach 14x in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems
