A Critical Analysis of Classifier Selection in Learned Bloom Filters
Dario Malchiodi, Davide Raimondi, Giacomo Fumagalli, Raffaele, Giancarlo, Marco Frasca

TL;DR
This paper presents a comprehensive analysis and methodology for evaluating Learned Bloom Filters, emphasizing the impact of classifier choice and data complexity on performance, and introduces software for designing optimized filters.
Contribution
It introduces the first in-depth assessment method for Learned Bloom Filters considering classifier and data complexity, along with supporting software for multi-criteria optimization.
Findings
Only two classifiers perform well across different data complexities.
Sandwiched Learned Bloom Filter is most robust to data and classifier variability.
Software effectively tests and compares different Learned Bloom Filter configurations.
Abstract
Learned Bloom Filters, i.e., models induced from data via machine learning techniques and solving the approximate set membership problem, have recently been introduced with the aim of enhancing the performance of standard Bloom Filters, with special focus on space occupancy. Unlike in the classical case, the "complexity" of the data used to build the filter might heavily impact on its performance. Therefore, here we propose the first in-depth analysis, to the best of our knowledge, for the performance assessment of a given Learned Bloom Filter, in conjunction with a given classifier, on a dataset of a given classification complexity. Indeed, we propose a novel methodology, supported by software, for designing, analyzing and implementing Learned Bloom Filters in function of specific constraints on their multi-criteria nature (that is, constraints involving space efficiency, false…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery
MethodsNone · Test · BLOOM
