FIKA-Bench: From Fine-grained Recognition to Fine-Grained Knowledge Acquisition

Geng Li; Yuxin Peng

arXiv:2605.13193·cs.CV·May 20, 2026

FIKA-Bench: From Fine-grained Recognition to Fine-Grained Knowledge Acquisition

Geng Li, Yuxin Peng

PDF

1 Datasets

TL;DR

FIKA-Bench introduces a new benchmark for evaluating systems on fine-grained knowledge acquisition, emphasizing external evidence search and verification, revealing current models' significant limitations.

Contribution

The paper presents FIKA-Bench, a high-quality, leakage-aware benchmark for fine-grained knowledge acquisition, and evaluates state-of-the-art models, highlighting their deficiencies in evidence retrieval and visual judgment.

Findings

01

Best model achieves only 25.1% accuracy.

02

Models struggle with entity retrieval and visual judgment.

03

Equipping models with tools alone is insufficient.

Abstract

Fine-grained recognition in everyday life is often not a closed-book classification problem: when encountering unfamiliar objects, humans actively search, compare visual details, and verify evidence before deciding. Existing benchmarks primarily evaluate visually recognition, leaving this active external knowledge acquisition ability underexplored. We study fine-grained knowledge acquisition, where a system must seek, verify, and use external evidence to answer open-ended fine-grained recognition questions. We introduce FIKA-Bench, a leakage-aware and evidence-grounded collection of 311 public-source and real-life instances. To ensure high quality, every example is filtered against frontier closed-book models to remove memorized cases and audited to eliminate image-answer leakage, retaining only samples supported by verified evidence. Our evaluation of latest Large Multimodal Models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

oking0197/FIKA-Bench
dataset· 51 dl
51 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.