OneDB: A Distributed Multi-Metric Data Similarity Search System
Tang Qian, Yifan Zhu, Lu Chen, Xiangyu Ke, Jingwen Zhao, Tianyi Li, Yunjun Gao, Christian S. Jensen

TL;DR
OneDB is a scalable, multi-metric data retrieval system that efficiently unifies diverse multi-modal data types for accurate similarity search, outperforming existing solutions in accuracy and speed.
Contribution
The paper introduces OneDB, a novel distributed system that unifies multi-modal data using multi-metric models with innovations like smart pruning, two-layer indexing, and autotuning.
Findings
Achieves 12.63%-30.75% better accuracy than state-of-the-art.
Speeds up search by 2.5-5.75 times compared to existing solutions.
Demonstrates high scalability and effective parameter autotuning.
Abstract
Increasingly massive volumes of multi-modal data are being accumulated in many {real world} settings, including in health care and e-commerce. This development calls for effective general-purpose data management solutions for multi-modal data. Such a solution must facilitate user-friendly and accurate retrieval of any multi-modal data according to diverse application requirements. Further, such a solution must be capable of efficient and scalable retrieval. To address this need, we present OneDB, a distributed multi-metric data similarity retrieval system. This system exploits the fact that data of diverse modalities, such as text, images, and video, can be represented as metric data. The system thus affords each data modality its own metric space with its own distance function and then uses a multi-metric model to unify multi-modal data. The system features several innovations: (i)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
