PALMED: Throughput Characterization for Superscalar Architectures -- Extended Version
Nicolas Derumigny, Fabian Gruber, Th\'eophile Bastian, Guillaume, Iooss, Christophe Guillon, Louis-No\"el Pouchet, Fabrice Rastello

TL;DR
This paper introduces Palmed, a tool that automatically constructs resource mappings for superscalar CPU architectures using runtime measurements, enabling accurate throughput modeling without hardware counters.
Contribution
Palmed provides a novel, resource-based abstraction for throughput characterization, simplifying port mapping problems and matching the accuracy of existing models.
Findings
Palmed achieves sub-10% mean square error in throughput prediction.
Resource mapping simplifies port assignment problems.
Effective for analyzing SPEC CPU 2017 benchmarks.
Abstract
In a super-scalar architecture, the scheduler dynamically assigns micro-operations (OPs) to execution ports. The port mapping of an architecture describes how an instruction decomposes into OPs and lists for each OP the set of ports it can be mapped to. It is used by compilers and performance debugging tools to characterize the performance throughput of a sequence of instructions repeatedly executed as the core component of a loop. This paper introduces a dual equivalent representation: The resource mapping of an architecture is an abstract model where, to be executed, an instruction must use a set of abstract resources, themselves representing combinations of execution ports. For a given architecture, finding a port mapping is an important but difficult problem. Building a resource mapping is a more tractable problem and provides a simpler and equivalent model. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Embedded Systems Design Techniques
