A Controlled Study of Memory Hierarchy Transitions in Quantum Circuit Simulation on Apple M4 Pro Unified Memory Architecture
Gyan Pratipat

TL;DR
This study analyzes quantum circuit simulation performance on Apple M4 Pro UMA, revealing memory-bound behavior, a significant qubit transition discontinuity, and that irregular memory access patterns outperform streaming bandwidth predictions.
Contribution
It provides a detailed hardware-characterization framework for quantum simulation workloads on UMA, highlighting the impact of memory access patterns and a notable qubit transition discontinuity.
Findings
All gate implementations are memory-bound with low arithmetic intensity.
A reproducible 4.46× timing discontinuity occurs at 28→29 qubits.
Irregular memory access patterns outperform streaming bandwidth predictions.
Abstract
State-vector quantum circuit simulation is memory-bandwidth bound, yet the interaction between memory hierarchy, access pattern, and hardware parallelism remains incompletely characterized. We address this using the Apple M4 Pro Unified Memory Architecture (UMA), where CPU and GPU share identical physical LPDDR5X DRAM (224 GB/s STREAM bandwidth for both), eliminating memory-technology and interconnect confounds. Using a thermally isolated, multi-trial methodology across 11 simulation backends on GHZ and QFT circuits from 3 to 30 qubits, we make three central contributions. First, a Roofline analysis confirms all gate implementations have arithmetic intensity 0.38 FLOP/byte, well below the ridge point for any plausible peak compute on modern hardware, establishing structural memory-boundedness. Second, we identify a reproducible 4.46 timing discontinuity at the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
