Parallelizing a modern GPU simulator

Rodrigo Huerta; Antonio Gonz\'alez

arXiv:2502.14691·cs.DC·May 27, 2025

Parallelizing a modern GPU simulator

Rodrigo Huerta, Antonio Gonz\'alez

PDF

Open Access

TL;DR

This paper introduces a simple, deterministic parallelization method for a GPU simulator using OpenMP, significantly reducing simulation time without sacrificing accuracy, thus enabling faster and larger-scale architecture research.

Contribution

The paper presents a minimal-code-change parallelization approach for GPU simulation that achieves high speed-ups while maintaining deterministic and accurate results.

Findings

01

Average speed-up of 5.8x with 16 threads

02

Achieves up to 14x speed-up on some workloads

03

Reduces simulation time from over five days to less than 12 hours

Abstract

Simulators are a primary tool in computer architecture research but are extremely computationally intensive. Simulating modern architectures with increased core counts and recent workloads can be challenging, even on modern hardware. This paper demonstrates that simulating some GPGPU workloads in a single-threaded state-of-the-art simulator such as Accel-sim can take more than five days. In this paper we present a simple approach to parallelize this simulator with minimal code changes by using OpenMP. Moreover, our parallelization technique is deterministic, so the simulator provides the same results for single-threaded and multi-threaded simulations. Compared to previous works, we achieve a higher speed-up, and, more importantly, the parallel simulation does not incur any inaccuracies. When we run the simulator with 16 threads, we achieve an average speed-up of 5.8x and reach 14x in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems