INQUIRE: A Natural World Text-to-Image Retrieval Benchmark
Edward Vendrow, Omiros Pantazis, Alexander Shepard, Gabriel Brostow,, Kate E. Jones, Oisin Mac Aodha, Sara Beery, Grant Van Horn

TL;DR
INQUIRE introduces a challenging natural world text-to-image retrieval benchmark with a large dataset and expert-level queries, aiming to advance multimodal models for ecological research.
Contribution
The paper presents INQUIRE, a new benchmark with a large dataset and expert queries, to evaluate and improve multimodal models for ecological and biodiversity image retrieval.
Findings
Current models struggle with the benchmark, achieving less than 50% mAP@50.
Reranking with advanced models improves retrieval performance.
The benchmark highlights the need for more nuanced multimodal understanding.
Abstract
We introduce INQUIRE, a text-to-image retrieval benchmark designed to challenge multimodal vision-language models on expert-level queries. INQUIRE includes iNaturalist 2024 (iNat24), a new dataset of five million natural world images, along with 250 expert-level retrieval queries. These queries are paired with all relevant images comprehensively labeled within iNat24, comprising 33,000 total matches. Queries span categories such as species identification, context, behavior, and appearance, emphasizing tasks that require nuanced image understanding and domain expertise. Our benchmark evaluates two core retrieval tasks: (1) INQUIRE-Fullrank, a full dataset ranking task, and (2) INQUIRE-Rerank, a reranking task for refining top-100 retrievals. Detailed evaluation of a range of recent multimodal models demonstrates that INQUIRE poses a significant challenge, with the best models failing to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsImage Retrieval and Classification Techniques
