Bloom Filter Encoding for Machine Learning

John Cartmell; Mihaela Cardei; and Ionut Cardei

arXiv:2512.19991·cs.LG·May 11, 2026

Bloom Filter Encoding for Machine Learning

John Cartmell, Mihaela Cardei, and Ionut Cardei

PDF

TL;DR

This paper introduces a Bloom filter-based encoding method for data preprocessing in machine learning, offering memory efficiency and data obfuscation while maintaining model performance across diverse datasets.

Contribution

The authors propose a novel Bloom filter transform for data encoding that is simple, flexible, and effective across multiple data types and classifiers.

Findings

01

Models trained on Bloom filter encodings perform comparably to raw data.

02

The method reduces memory usage across datasets.

03

Encoding provides data obfuscation while preserving similarity structures.

Abstract

We present a method that uses a Bloom filter transform to preprocess data for machine learning. Each sample is encoded into a compact bit-array representation using hash-based encoding, producing a fixed-length feature space that reduces memory usage and obfuscates original feature values. The encoding does not rely on keyed hashing; however, a key can optionally be used to control the mapping and would be required to reproduce the representation. We evaluate the approach on six datasets spanning text, time-series, tabular, and image domains: SMS Spam Collection, ECG200, Adult 50K, CDC Diabetes, MNIST, and Fashion MNIST. Four classifiers are considered: Extreme Gradient Boosting, Deep Neural Networks, Convolutional Neural Networks, and Logistic Regression. Results show that models trained on Bloom filter encodings achieve performance comparable to models trained on raw data or standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.