Taxonomy-Aware Representation Alignment for Hierarchical Visual Recognition with Large Multimodal Models
Hulingxiao He, Zhi Tan, Yuxin Peng

TL;DR
This paper introduces TARA, a method that improves hierarchical visual recognition in large multimodal models by aligning visual features with biological taxonomy knowledge, enhancing accuracy for known and novel categories.
Contribution
TARA is a novel approach that injects taxonomic knowledge into LMMs using hierarchical contrastive learning with biological foundation models, improving hierarchical consistency and recognition accuracy.
Findings
Enhances LMMs' hierarchical consistency.
Improves leaf node accuracy for known and novel categories.
Effective in complex biological taxonomies.
Abstract
A high-performing, general-purpose visual understanding model should map visual inputs to a taxonomic tree of labels, identify novel categories beyond the training set for which few or no publicly available images exist. Large Multimodal Models (LMMs) have achieved remarkable progress in fine-grained visual recognition (FGVR) for known categories. However, they remain limited in hierarchical visual recognition (HVR) that aims at predicting consistent label paths from coarse to fine categories, especially for novel categories. To tackle these challenges, we propose Taxonomy-Aware Representation Alignment (TARA), a simple yet effective strategy to inject taxonomic knowledge into LMMs. TARA leverages representations from biology foundation models (BFMs) that encode rich biological relationships through hierarchical contrastive learning. By aligning the intermediate representations of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Smart Agriculture and AI · Multimodal Machine Learning Applications
