Scalable String Reconciliation by Recursive Content-Dependent Shingling
Bowen Song, Ari Trachtenberg

TL;DR
This paper introduces RCDS, a scalable protocol for string reconciliation that outperforms Rsync in many cases, especially for repositories with frequent small updates, by minimizing communication and scaling linearly with edit distance.
Contribution
The paper presents the novel RCDS protocol, a practical and scalable solution for string reconciliation that improves upon existing methods like Rsync in real-world scenarios.
Findings
RCDS outperforms Rsync in 51% of tested repositories.
The protocol scales linearly with the edit distance.
RCDS is particularly effective for repositories with frequent small updates.
Abstract
We consider the problem of reconciling similar, but remote, strings with minimum communication complexity. This "string reconciliation" problem is a fundamental building block for a variety of networking applications, including those that maintain large-scale distributed networks and perform remote file synchronization. We present the novel Recursive Content-Dependent Shingling (RCDS) protocol that is computationally practical for large strings and scales linearly with the edit distance between the remote strings. We provide comparisons to the performance of Rsync, one of the most popular file synchronization tools in active use. Our experiments show that, with minimal engineering, RCDS outperforms the heavily optimized Rsync in reconciling release revisions for about 51% of the 5000 top starred git repositories on GitHub. The improvement is particularly evident for repositories that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Caching and Content Delivery · Peer-to-Peer Network Technologies
