FIKA-Bench: From Fine-grained Recognition to Fine-Grained Knowledge Acquisition
Geng Li, Yuxin Peng

TL;DR
FIKA-Bench introduces a new benchmark for evaluating systems on fine-grained knowledge acquisition, emphasizing external evidence search and verification, revealing current models' significant limitations.
Contribution
The paper presents FIKA-Bench, a high-quality, leakage-aware benchmark for fine-grained knowledge acquisition, and evaluates state-of-the-art models, highlighting their deficiencies in evidence retrieval and visual judgment.
Findings
Best model achieves only 25.1% accuracy.
Models struggle with entity retrieval and visual judgment.
Equipping models with tools alone is insufficient.
Abstract
Fine-grained recognition in everyday life is often not a closed-book classification problem: when encountering unfamiliar objects, humans actively search, compare visual details, and verify evidence before deciding. Existing benchmarks primarily evaluate visually recognition, leaving this active external knowledge acquisition ability underexplored. We study fine-grained knowledge acquisition, where a system must seek, verify, and use external evidence to answer open-ended fine-grained recognition questions. We introduce FIKA-Bench, a leakage-aware and evidence-grounded collection of 311 public-source and real-life instances. To ensure high quality, every example is filtered against frontier closed-book models to remove memorized cases and audited to eliminate image-answer leakage, retaining only samples supported by verified evidence. Our evaluation of latest Large Multimodal Models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
