# From Visual to Multimodal: Systematic Ablation of Encoders and Fusion Strategies in Animal Identification

**Authors:** Vasiliy Kudryavtsev, Kirill Borodin, German Berezin, Kirill Bubenchikov, Grach Mkrtchian, Alexander Ryzhkov

PMC · DOI: 10.3390/jimaging12010030 · Journal of Imaging · 2026-01-07

## TL;DR

This paper introduces a new framework for identifying animals using both visual and textual information, achieving better accuracy than previous methods.

## Contribution

The novel contribution is a multimodal framework combining visual features with synthetic textual descriptions for improved animal identification.

## Key findings

- A gated fusion mechanism achieved 84.28% Top-1 accuracy in animal identification.
- The approach improved performance by 11% compared to unimodal baselines.
- Synthesized semantic descriptions significantly refined decision boundaries in pet re-identification.

## Abstract

Automated animal identification is a practical task for reuniting lost pets with their owners, yet current systems often struggle due to limited dataset scale and reliance on unimodal visual cues. This study introduces a multimodal verification framework that enhances visual features with semantic identity priors derived from synthetic textual descriptions. We constructed a massive training corpus of 1.9 million photographs covering 695,091 unique animals to support this investigation. Through systematic ablation studies, we identified SigLIP2-Giant and E5-Small-v2 as the optimal vision and text backbones. We further evaluated fusion strategies ranging from simple concatenation to adaptive gating to determine the best method for integrating these modalities. Our proposed approach utilizes a gated fusion mechanism and achieved a Top-1 accuracy of 84.28% and an Equal Error Rate of 0.0422 on a comprehensive test protocol. These results represent an 11% improvement over leading unimodal baselines and demonstrate that integrating synthesized semantic descriptions significantly refines decision boundaries in large-scale pet re-identification.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12843040/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12843040/full.md

## References

70 references — full list in the complete paper: https://tomesphere.com/paper/PMC12843040/full.md

---
Source: https://tomesphere.com/paper/PMC12843040