Balancing clusters to reduce response time variability in large scale image search
Romain Tavenard (INRIA - IRISA), Laurent Amsaleg (INRIA - IRISA),, Herv\'e J\'egou (INRIA - IRISA)

TL;DR
This paper introduces a modified k-means clustering algorithm that balances cluster sizes to reduce response time variability in large-scale image search, improving consistency without compromising search quality.
Contribution
It proposes a novel modification to k-means that produces more balanced clusters, addressing response time variability in high-dimensional approximate nearest neighbor search.
Findings
Significantly reduces response time variance
Maintains high search quality
Effective on large-scale image descriptor datasets
Abstract
Many algorithms for approximate nearest neighbor search in high-dimensional spaces partition the data into clusters. At query time, in order to avoid exhaustive search, an index selects the few (or a single) clusters nearest to the query point. Clusters are often produced by the well-known -means approach since it has several desirable properties. On the downside, it tends to produce clusters having quite different cardinalities. Imbalanced clusters negatively impact both the variance and the expectation of query response times. This paper proposes to modify -means centroids to produce clusters with more comparable sizes without sacrificing the desirable properties. Experiments with a large scale collection of image descriptors show that our algorithm significantly reduces the variance of response times without seriously impacting the search quality.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Image Retrieval and Classification Techniques
