NeedleDB: A Generative-AI Based System for Accurate and Efficient Image Retrieval using Complex Natural Language Queries
Mahdi Erfanian, Abolfazl Asudeh

TL;DR
NeedleDB is an open-source database system that uses generative AI to improve image retrieval accuracy for complex natural language queries, outperforming existing contrastive-learning methods.
Contribution
It introduces a generative AI approach for visual query synthesis, transforming text-to-image retrieval into image-to-image search, with a scalable architecture and provable error bounds.
Findings
Improves Mean Average Precision by up to 93% over baselines.
Achieves sub-second query latency on challenging benchmarks.
Provides a full-featured, modular system with CLI and Web UI.
Abstract
We demonstrate NeedleDB, an open-source, deployment-ready database system for answering complex natural language queries over image data. Unlike existing approaches that rely on contrastive-learning embeddings (e.g., CLIP), which degrade on compositional or nuanced queries, NeedleDB leverages generative AI to synthesize guide images that represent the query in the visual domain, transforming the text-to-image retrieval problem into a more tractable image-to-image search. The system aggregates nearest-neighbor results across multiple vision embedders using a weighted rank-fusion strategy grounded in a Monte Carlo estimator with provable error bounds. NeedleDB ships with a full-featured command-line interface (needlectl), a browser-based Web UI, and a modular microservice architecture backed by PostgreSQL and Milvus. On challenging benchmarks, it improves Mean Average Precision by up to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
