System Description for a Scalable, Fault-Tolerant, Distributed Garbage Collector
N. Allen, T. Terriberry

TL;DR
This paper presents a scalable, fault-tolerant distributed garbage collection algorithm that minimizes network communication and synchronization, improving efficiency and robustness in managing cyclic garbage in distributed systems.
Contribution
It introduces a novel distributed garbage collection algorithm combining back tracing with explicit forward tracing and heuristics for reduced work and fault tolerance.
Findings
Reduces network message size and count
Achieves fault-tolerant cooperation between traces
Decreases total work through heuristics
Abstract
We describe an efficient and fault-tolerant algorithm for distributed cyclic garbage collection. The algorithm imposes few requirements on the local machines and allows for flexibility in the choice of local collector and distributed acyclic garbage collector to use with it. We have emphasized reducing the number and size of network messages without sacrificing the promptness of collection throughout the algorithm. Our proposed collector is a variant of back tracing to avoid extensive synchronization between machines. We have added an explicit forward tracing stage to the standard back tracing stage and designed a tuned heuristic to reduce the total amount of work done by the collector. Of particular note is the development of fault-tolerant cooperation between traces and a heuristic that aggressively reduces the set of suspect objects.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Real-Time Systems Scheduling · Software System Performance and Reliability
