Garbage Collection for Multicore NUMA Machines
Sven Auhagen, Lars Bergstrom, Matthew Fluet, John Reppy

TL;DR
This paper introduces a garbage collector designed for multicore NUMA architectures, improving scalability of parallel functional language implementations by effectively managing memory across heterogeneous memory hierarchies.
Contribution
It presents a novel garbage collection technique integrated with Manticore, a strict parallel functional language, demonstrating improved scalability on high-core-count NUMA systems.
Findings
Scales effectively on 48-core AMD Opteron machine
Achieves better memory bandwidth utilization
Demonstrates improved scalability over traditional methods
Abstract
Modern high-end machines feature multiple processor packages, each of which contains multiple independent cores and integrated memory controllers connected directly to dedicated physical RAM. These packages are connected via a shared bus, creating a system with a heterogeneous memory hierarchy. Since this shared bus has less bandwidth than the sum of the links to memory, aggregate memory bandwidth is higher when parallel threads all access memory local to their processor package than when they access memory attached to a remote package. This bandwidth limitation has traditionally limited the scalability of modern functional language implementations, which seldom scale well past 8 cores, even on small benchmarks. This work presents a garbage collector integrated with our strict, parallel functional language implementation, Manticore, and shows that it scales effectively on both a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed systems and fault tolerance · Advanced Data Storage Technologies
