SARE: Sample-wise Adaptive Reasoning for Training-free Fine-grained Visual Recognition

Jingxiao Yang; DaLin He; Miao Pan; Kaixiang Yao; Ge Su; Wenqi Zhang; Yifeng Hu; Tangwei Li; Yuke Li; Xuhong Zhang

arXiv:2603.17729·cs.CV·April 29, 2026

SARE: Sample-wise Adaptive Reasoning for Training-free Fine-grained Visual Recognition

Jingxiao Yang, DaLin He, Miao Pan, Kaixiang Yao, Ge Su, Wenqi Zhang, Yifeng Hu, Tangwei Li, Yuke Li, Xuhong Zhang

PDF

TL;DR

SARE is a training-free framework that adaptively combines retrieval and reasoning for fine-grained visual recognition, improving accuracy and efficiency by leveraging past failures without parameter updates.

Contribution

It introduces a cascaded, sample-wise adaptive reasoning approach with a self-reflective experience mechanism for training-free FGVR.

Findings

01

Achieves state-of-the-art performance on 14 datasets.

02

Reduces computational overhead compared to existing methods.

03

Effectively leverages past failures for improved inference.

Abstract

Recent advances in Large Vision-Language Models (LVLMs) have enabled training-free Fine-Grained Visual Recognition (FGVR). However, effectively exploiting LVLMs for FGVR remains challenging due to the inherent visual ambiguity of subordinate-level categories. Existing methods predominantly adopt either retrieval-oriented or reasoning-oriented paradigms to tackle this challenge, but both are constrained by two fundamental limitations:(1) They apply the same inference pipeline to all samples without accounting for uneven recognition difficulty, thereby leading to suboptimal accuracy and efficiency; (2) The lack of mechanisms to consolidate and reuse error-specific experience causes repeated failures on similar challenging cases. To address these limitations, we propose SARE, a Sample-wise Adaptive textbfREasoning framework for training-free FGVR. Specifically, SARE adopts a cascaded…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.