How to Train Your Filter: Should You Learn, Stack or Adapt?
Diandre Miguel Sabale, Wolfgang Gatterbauer, Prashant Pandey

TL;DR
This paper comprehensively evaluates learned, stacked, and adaptive filters, revealing their trade-offs in false positive rates, robustness, and query latency across various workloads, guiding their appropriate application.
Contribution
It provides the first comparative analysis of these three filter paradigms across real-world datasets, clarifying their strengths, weaknesses, and suitable use cases.
Findings
Learned filters achieve up to 10^2 lower FPRs but have high variance and slow query times.
Stacked filters reach up to 10^3 lower FPRs on skewed workloads but need workload knowledge.
Adaptive filters are robust, achieving up to 10^3 lower FPRs without workload assumptions.
Abstract
Filters are ubiquitous in computer science, enabling space-efficient approximate membership testing. Since Bloom filters were introduced in 1970, decades of work improved their space efficiency and performance. Recently, three new paradigms have emerged offering orders-of-magnitude improvements in false positive rates (FPRs) by using information beyond the input set: (1) learned filters train a model to distinguish (non)members, (2) stacked filters use negative workload samples to build cascading layers, and (3) adaptive filters update internal representation in response to false positive feedback. Yet each paradigm targets specific use cases, introduces complex configuration tuning, and has been evaluated in isolation. This results in unclear trade-offs and a gap in understanding how these approaches compare and when each is most appropriate. This paper presents the first comprehensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Data Quality and Management · Advanced Neural Network Applications
