Anatomy of the gem5 Simulator: AtomicSimpleCPU, TimingSimpleCPU, O3CPU, and Their Interaction with the Ruby Memory System
Johan S\"oderstr\"om (1), Yuan Yao (1) ((1) Uppsala University)

TL;DR
This paper provides a detailed analysis of gem5's CPU models and their interaction with the Ruby memory system, using profiling to identify bottlenecks and inform optimization efforts.
Contribution
It offers an anatomical overview of three major gem5 CPU models and introduces a profiling framework to analyze their performance and interactions with memory.
Findings
Ruby memory subsystem dominates execution time in AS and TS CPUs during instruction fetch
O3 CPU spends less time in Ruby, focusing more on instruction construction and pipelining
Profiling framework can be used to analyze other gem5 components or develop new models
Abstract
gem5 is a popular modular-based computer system simulator, widely used in computer architecture research and known for its long simulation time and steep learning curve. This report examines its three major CPU models: the AtomicSimpleCPU (AS CPU), the TimingSimpleCPU (TS CPU), the Out-of-order (O3) CPU, and their interactions with the memory subsystem. We provide a detailed anatomical overview of each CPU's function call-chains and present how gem5 partitions its execution time for each simulated hardware layer. We perform our analysis using a lightweight profiler built on Linux's perf_event interface, with user-configurable options to target specific functions and examine their interactions in detail. By profiling each CPU across a wide selection of benchmarks, we identify their software bottlenecks. Our results show that the Ruby memory subsystem consistently accounts for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
