System Description for a Scalable, Fault-Tolerant, Distributed Garbage   Collector

N. Allen; T. Terriberry

arXiv:cs/0207036·cs.DC·May 23, 2007

System Description for a Scalable, Fault-Tolerant, Distributed Garbage Collector

N. Allen, T. Terriberry

PDF

Open Access

TL;DR

This paper presents a scalable, fault-tolerant distributed garbage collection algorithm that minimizes network communication and synchronization, improving efficiency and robustness in managing cyclic garbage in distributed systems.

Contribution

It introduces a novel distributed garbage collection algorithm combining back tracing with explicit forward tracing and heuristics for reduced work and fault tolerance.

Findings

01

Reduces network message size and count

02

Achieves fault-tolerant cooperation between traces

03

Decreases total work through heuristics

Abstract

We describe an efficient and fault-tolerant algorithm for distributed cyclic garbage collection. The algorithm imposes few requirements on the local machines and allows for flexibility in the choice of local collector and distributed acyclic garbage collector to use with it. We have emphasized reducing the number and size of network messages without sacrificing the promptness of collection throughout the algorithm. Our proposed collector is a variant of back tracing to avoid extensive synchronization between machines. We have added an explicit forward tracing stage to the standard back tracing stage and designed a tuned heuristic to reduce the total amount of work done by the collector. Of particular note is the development of fault-tolerant cooperation between traces and a heuristic that aggressively reduces the set of suspect objects.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed systems and fault tolerance · Real-Time Systems Scheduling · Software System Performance and Reliability