Surely Large Multimodal Models (Don't) Excel in Visual Species Recognition?

Tian Liu; Anwesha Basu; James Caverlee; Shu Kong

arXiv:2512.15748·cs.LG·December 19, 2025

Surely Large Multimodal Models (Don't) Excel in Visual Species Recognition?

Tian Liu, Anwesha Basu, James Caverlee, Shu Kong

PDF

Open Access 1 Datasets

TL;DR

This paper investigates the performance of Large Multimodal Models in visual species recognition, revealing they underperform compared to simple few-shot learning models but can effectively correct their predictions post-hoc, leading to improved accuracy.

Contribution

The paper introduces a simple post-hoc correction method that leverages LMMs to re-rank FSL expert model predictions, significantly boosting accuracy without additional training.

Findings

01

LMMs underperform compared to FSL expert models in VSR.

02

LMMs can effectively correct FSL predictions post-hoc.

03

POC improves accuracy by +6.4% across benchmarks.

Abstract

Visual Species Recognition (VSR) is pivotal to biodiversity assessment and conservation, evolution research, and ecology and ecosystem management. Training a machine-learned model for VSR typically requires vast amounts of annotated images. Yet, species-level annotation demands domain expertise, making it realistic for domain experts to annotate only a few examples. These limited labeled data motivate training an ''expert'' model via few-shot learning (FSL). Meanwhile, advanced Large Multimodal Models (LMMs) have demonstrated prominent performance on general recognition tasks. It is straightforward to ask whether LMMs excel in the highly specialized VSR task and whether they outshine FSL expert models. Somewhat surprisingly, we find that LMMs struggle in this task, despite using various established prompting techniques. LMMs even significantly underperform FSL expert models, which are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

autoexpert-cvpr2026-workshop/ASA2026_dataset
dataset· 2 dl
2 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Species Distribution and Climate Change