Biff (Bloom Filter) Codes : Fast Error Correction for Large Data Sets

Michael Mitzenmacher; George Varghese

arXiv:1208.0798·cs.DS·March 20, 2015·2 cites

Biff (Bloom Filter) Codes : Fast Error Correction for Large Data Sets

Michael Mitzenmacher, George Varghese

PDF

Open Access

TL;DR

Biff codes are a novel error correction method based on Bloom filters, optimized for large data sets in cloud environments, offering fast encoding and decoding with minimal space overhead.

Contribution

Introduction of Biff codes, a simple, efficient error correction scheme for large data, utilizing Bloom filters and invertible Bloom lookup tables, suitable for cloud data reconciliation.

Findings

01

Decodes 1 million words with thousands of errors in under a second

02

Encoding time is linear in message length, decoding is linear in message length plus errors

03

Space overhead is proportional to the number of errors

Abstract

Large data sets are increasingly common in cloud and virtualized environments. For example, transfers of multiple gigabytes are commonplace, as are replicated blocks of such sizes. There is a need for fast error-correction or data reconciliation in such settings even when the expected number of errors is small. Motivated by such cloud reconciliation problems, we consider error-correction schemes designed for large data, after explaining why previous approaches appear unsuitable. We introduce Biff codes, which are based on Bloom filters and are designed for large data. For Biff codes with a message of length $L$ and $E$ errors, the encoding time is $O (L)$ , decoding time is $O (L + E)$ and the space overhead is $O (E)$ . Biff codes are low-density parity-check codes; they are similar to Tornado codes, but are designed for errors instead of erasures. Further, Biff codes are designed to be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCaching and Content Delivery · Carbon and Quantum Dots Applications · Advanced Data Storage Technologies