RealBirdID: Benchmarking Bird Species Identification in the Era of MLLMs
Logan Lawrence, Mustafa Chasmai, Rangel Daroya, Wuao Liu, Seoyun Jeong, Aaron Sun, Max Hamilton, Fabien Delattre, Oindrila Saha, Subhransu Maji, Grant Van Horn

TL;DR
RealBirdID introduces a benchmark for bird species identification that emphasizes the importance of abstaining with evidence-based rationales when images are unanswerable, revealing current model limitations.
Contribution
The paper presents the RealBirdID benchmark, focusing on abstention and rationale generation in fine-grained bird identification, highlighting challenges for existing models.
Findings
Species identification accuracy is below 13% on answerable unanswerable cases for current models.
Models with higher classification ability do not necessarily abstain more appropriately.
MLLMs often fail to provide correct reasons even when they abstain.
Abstract
Fine-grained bird species identification in the wild is frequently unanswerable from a single image: key cues may be non-visual (e.g. vocalization), or obscured due to occlusion, camera angle, or low resolution. Yet today's multimodal systems are typically judged on answerable, in-schema cases, encouraging confident guesses rather than principled abstention. We propose the RealBirdID benchmark: given an image of a bird, a system should either answer with a species or abstain with a concrete, evidence-based rationale: "requires vocalization," "low quality image," or "view obstructed". For each genus, the dataset includes a validation split composed of curated unanswerable examples with labeled rationales, paired with a companion set of clearly answerable instances. We find that (1) the species identification on the answerable set is challenging for a variety of open-source and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
