Neural Catalog: Scaling Species Recognition with Catalog of Life-Augmented Generation
Faizan Farooq Khan, Jun Chen, Youssef Mohamed, Chun-Mei Feng, Mohamed Elhoseiny

TL;DR
This paper introduces VR-RAG, a novel framework that combines structured encyclopedic knowledge with visual information to improve open-vocabulary species recognition, significantly outperforming previous models in bird classification tasks.
Contribution
We propose VR-RAG, a new retrieval-augmented generation method that integrates visual data with encyclopedic knowledge for enhanced species recognition.
Findings
VR-RAG improves recognition accuracy by 18% over state-of-the-art models.
The method effectively handles thousands of candidate species with high visual similarity.
Experiments demonstrate robustness across multiple bird classification benchmarks.
Abstract
Open-vocabulary species recognition is a major challenge in computer vision, particularly in ornithology, where new taxa are continually discovered. While benchmarks like CUB-200-2011 and Birdsnap have advanced fine-grained recognition under closed vocabularies, they fall short of real-world conditions. We show that current systems suffer a performance drop of over 30\% in realistic open-vocabulary settings with thousands of candidate species, largely due to an increased number of visually similar and semantically ambiguous distractors. To address this, we propose Visual Re-ranking Retrieval-Augmented Generation (VR-RAG), a novel framework that links structured encyclopedic knowledge with recognition. We distill Wikipedia articles for 11,202 bird species into concise, discriminative summaries and retrieve candidates from these summaries. Unlike prior text-only approaches, VR-RAG…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training · Focus
