BloomCoreset: Fast Coreset Sampling using Bloom Filters for Fine-Grained Self-Supervised Learning
Prajwal Singh, Gautam Vashishtha, Indra Deep Mastan, Shanmuganathan, Raman

TL;DR
BloomCoreset introduces a Bloom filter-based sampling method that accelerates coreset selection in fine-grained self-supervised learning, maintaining high accuracy while drastically reducing sampling time.
Contribution
The paper presents a novel Bloom filter-based approach for fast coreset sampling in fine-grained SSL, significantly improving efficiency over existing methods.
Findings
Achieves 98.5% reduction in sampling time
Maintains only 0.83% accuracy trade-off
Outperforms baseline sampling strategies
Abstract
The success of deep learning in supervised fine-grained recognition for domain-specific tasks relies heavily on expert annotations. The Open-Set for fine-grained Self-Supervised Learning (SSL) problem aims to enhance performance on downstream tasks by strategically sampling a subset of images (the Core-Set) from a large pool of unlabeled data (the Open-Set). In this paper, we propose a novel method, BloomCoreset, that significantly reduces sampling time from Open-Set while preserving the quality of samples in the coreset. To achieve this, we utilize Bloom filters as an innovative hashing mechanism to store both low- and high-level features of the fine-grained dataset, as captured by Open-CLIP, in a space-efficient manner that enables rapid retrieval of the coreset from the Open-Set. To show the effectiveness of the sampled coreset, we integrate the proposed method into the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery · Text and Document Classification Technologies · Network Packet Processing and Optimization
MethodsBLOOM
