TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning

Maximilian von Klinski; Maximilian Schall

arXiv:2603.04380·cs.CV·March 5, 2026

TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning

Maximilian von Klinski, Maximilian Schall

PDF

Open Access

TL;DR

TaxonRL introduces a hierarchical reinforcement learning method with intermediate rewards for interpretable, fine-grained visual reasoning, significantly improving accuracy and transparency in species classification tasks.

Contribution

It proposes a novel reinforcement learning framework with hierarchical reasoning and intermediate rewards for interpretable fine-grained visual classification.

Findings

01

Achieves 91.7% accuracy on Birds-to-Words dataset, surpassing human performance.

02

Provides interpretable reasoning traces for model decisions.

03

Demonstrates strong cross-domain generalization to other species datasets.

Abstract

Traditional vision-language models struggle with contrastive fine-grained taxonomic reasoning, particularly when distinguishing between visually similar species within the same genus or family. We introduce TaxonRL, a reinforcement learning approach using Group Relative Policy Optimization with intermediate rewards that decomposes the reasoning process into hierarchical taxonomic predictions. Our method incentivizes models to explicitly reason about species-level, genus-level, and family-level features before making final classifications. This structured approach is designed not only to boost accuracy but also to yield a transparent, verifiable decision-making process. On the challenging Birds-to-Words dataset, TaxonRL achieves 91.7\% average accuracy, exceeding human performance (77.3\%) while generating interpretable reasoning traces. We demonstrate strong cross-domain generalization,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpecies Distribution and Climate Change · Multimodal Machine Learning Applications · Biomedical Text Mining and Ontologies