FusionAgent: A Multimodal Agent with Dynamic Model Selection for Human Recognition

Jie Zhu; Xiao Guo; Yiyang Su; Anil Jain; Xiaoming Liu

arXiv:2603.26908·cs.CV·March 31, 2026

FusionAgent: A Multimodal Agent with Dynamic Model Selection for Human Recognition

Jie Zhu, Xiao Guo, Yiyang Su, Anil Jain, Xiaoming Liu

PDF

1 Repo

TL;DR

FusionAgent introduces a dynamic, sample-specific model selection framework using a multimodal large language model and reinforcement fine-tuning to improve human recognition accuracy and efficiency.

Contribution

It presents a novel agentic framework that adaptively selects models for each sample, addressing static fusion limitations and enhancing recognition performance.

Findings

01

Outperforms state-of-the-art methods on biometric benchmarks.

02

Achieves higher efficiency with fewer model invocations.

03

Demonstrates robustness and explainability in model fusion.

Abstract

Model fusion is a key strategy for robust recognition in unconstrained scenarios, as different models provide complementary strengths. This is especially important for whole-body human recognition, where biometric cues such as face, gait, and body shape vary across samples and are typically integrated via score-fusion. However, existing score-fusion strategies are usually static, invoking all models for every test sample regardless of sample quality or modality reliability. To overcome these limitations, we propose \textbf{FusionAgent}, a novel agentic framework that leverages a Multimodal Large Language Model (MLLM) to perform dynamic, sample-specific model selection. Each expert model is treated as a tool, and through Reinforcement Fine-Tuning (RFT) with a metric-based reward, the agent learns to adaptively determine the optimal model combination for each test input. To address the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://fusionagent.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.