Understanding Simulated Architecture via gem5 Call-Stack Profiling
Johan S\"oderstr\"om (1), Rashid Aligholipour (1), Yuan Yao (1) ((1) Uppsala University)

TL;DR
This paper introduces a lightweight call-stack profiling framework for gem5 that provides direct insights into simulated architecture behavior, revealing issues like cache deadlocks and inefficiencies.
Contribution
A novel, non-intrusive profiling method using Linux perf_event to analyze gem5's call-stacks, enabling detailed, hierarchical, and component-specific insights into simulation internals.
Findings
Uncovered inefficiencies in TimingSimpleCPU due to lockup-cache model
Detected cache coherence deadlock and livelock issues in gem5 simulations
Provided detailed call-tree views of gem5's internal activity
Abstract
Understanding the behavior of simulated architectures in gem5 is critical for studying complex, deeply integrated computing systems. However, conventional analysis methods provide only an indirect view of the simulated system internals. In this work, we show that call-stack profiling of gem5 itself offers a powerful yet underutilized perspective: the simulator's own call-stack directly reflects the activity of the simulated system, exposing insights that conventional statistics may overlook. Profiling gem5's call-stacks is challenging due to its highly layered and complex software design patterns. To address this, we introduce a specialized, lightweight profiling framework built on Linux's perf_event interface which samples gem5's runtime call-stacks throughout the simulation, resolves symbols on the fly, and merges samples into a hierarchical call-tree representation supporting both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
