Locality-Sensitive Hashing for Efficient Hard Negative Sampling in Contrastive Learning

Fabian Deuser; Philipp Hausenblas; Hannah Schieber; Daniel Roth; Martin Werner; Norbert Oswald

arXiv:2505.17844·cs.CV·May 26, 2025

Locality-Sensitive Hashing for Efficient Hard Negative Sampling in Contrastive Learning

Fabian Deuser, Philipp Hausenblas, Hannah Schieber, Daniel Roth, Martin Werner, Norbert Oswald

PDF

TL;DR

This paper introduces a GPU-efficient Locality-Sensitive Hashing method for fast and effective hard negative sampling in contrastive learning, improving performance with less computation.

Contribution

The paper proposes a novel LSH scheme for approximate nearest neighbor search tailored for contrastive learning, with theoretical analysis and empirical validation.

Findings

01

Achieves comparable or better performance than existing methods.

02

Requires significantly less computation.

03

Effective across textual and visual datasets.

Abstract

Contrastive learning is a representational learning paradigm in which a neural network maps data elements to feature vectors. It improves the feature space by forming lots with an anchor and examples that are either positive or negative based on class similarity. Hard negative examples, which are close to the anchor in the feature space but from a different class, improve learning performance. Finding such examples of high quality efficiently in large, high-dimensional datasets is computationally challenging. In this paper, we propose a GPU-friendly Locality-Sensitive Hashing (LSH) scheme that quantizes real-valued feature vectors into binary representations for approximate nearest neighbor search. We investigate its theoretical properties and evaluate it on several datasets from textual and visual domain. Our approach achieves comparable or better performance while requiring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.