From Visual to Multimodal: Systematic Ablation of Encoders and Fusion Strategies in Animal Identification
Vasiliy Kudryavtsev, Kirill Borodin, German Berezin, Kirill Bubenchikov, Grach Mkrtchian, Alexander Ryzhkov

TL;DR
This paper presents a multimodal animal identification system that combines visual features with semantic textual priors, significantly improving accuracy over unimodal methods through systematic ablation of encoders and fusion strategies.
Contribution
It introduces a large-scale dataset and a novel gated fusion approach that effectively integrates visual and semantic modalities for animal re-identification.
Findings
Achieved 84.28% Top-1 accuracy, an 11% improvement over unimodal baselines.
Identified optimal vision and text backbones: SigLIP2-Giant and E5-Small-v2.
Demonstrated the effectiveness of semantic priors in refining decision boundaries.
Abstract
Automated animal identification is a practical task for reuniting lost pets with their owners, yet current systems often struggle due to limited dataset scale and reliance on unimodal visual cues. This study introduces a multimodal verification framework that enhances visual features with semantic identity priors derived from synthetic textual descriptions. We constructed a massive training corpus of 1.9 million photographs covering 695,091~unique animals to support this investigation. Through systematic ablation studies, we identified SigLIP2-Giant and E5-Small-v2 as the optimal vision and text backbones. We further evaluated fusion strategies ranging from simple concatenation to adaptive gating to determine the best method for integrating these modalities. Our proposed approach utilizes a gated fusion mechanism and achieved a Top-1 accuracy of 84.28\% and an Equal Error Rate of 0.0422…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman-Animal Interaction Studies · Food Supply Chain Traceability · Advanced Neural Network Applications
