TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning
Maximilian von Klinski, Maximilian Schall

TL;DR
TaxonRL introduces a hierarchical reinforcement learning method with intermediate rewards for interpretable, fine-grained visual reasoning, significantly improving accuracy and transparency in species classification tasks.
Contribution
It proposes a novel reinforcement learning framework with hierarchical reasoning and intermediate rewards for interpretable fine-grained visual classification.
Findings
Achieves 91.7% accuracy on Birds-to-Words dataset, surpassing human performance.
Provides interpretable reasoning traces for model decisions.
Demonstrates strong cross-domain generalization to other species datasets.
Abstract
Traditional vision-language models struggle with contrastive fine-grained taxonomic reasoning, particularly when distinguishing between visually similar species within the same genus or family. We introduce TaxonRL, a reinforcement learning approach using Group Relative Policy Optimization with intermediate rewards that decomposes the reasoning process into hierarchical taxonomic predictions. Our method incentivizes models to explicitly reason about species-level, genus-level, and family-level features before making final classifications. This structured approach is designed not only to boost accuracy but also to yield a transparent, verifiable decision-making process. On the challenging Birds-to-Words dataset, TaxonRL achieves 91.7\% average accuracy, exceeding human performance (77.3\%) while generating interpretable reasoning traces. We demonstrate strong cross-domain generalization,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpecies Distribution and Climate Change · Multimodal Machine Learning Applications · Biomedical Text Mining and Ontologies
