BoomHQ: Learning to Boost Multiple Hybrid Queries on Vector DBMSs
Ermu Qiu, Tianyi Chen, Jun Gao, Xing Wei, Yaofeng Tu, Yinjun Han, and Yang Lin

TL;DR
BoomHQ is a learning-based framework that optimizes multiple hybrid vector and scalar queries in vector DBMSs by modeling attribute correlations and query patterns, achieving significant speedups.
Contribution
It introduces a novel autoencoder-based approach to model attribute correlations and captures query patterns for optimized hybrid query execution.
Findings
Achieves 2x average speedup over state-of-the-art methods.
Demonstrates over 25x peak speedup for hybrid queries.
Shows robustness to data updates and consistency across systems.
Abstract
Hybrid queries, which combine vector nearest neighbor searches with scalar predicates, represent a fundamental challenge in managing vector databases. Existing methods often restrict the number of vector columns involved or the complexity of scalar predicates, thereby limiting their flexibility in handling diverse query patterns. Moreover, these approaches typically do not fully leverage the correlations between scalar and vector attributes, or the distributional patterns observed from query vector neighborhoods. To address these limitations, we introduce BoomHQ, a learning-based framework to boost multiple hybrid queries on vector DBMSs. First, BoomHQ models the correlation between vector and scalar attributes using an autoencoder-based architecture, which is also friendly to data updates. Second, BoomHQ captures prevailing query patterns, particularly using estimated selectivity of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
