Adaptive Multimodal Person Recognition: A Robust Framework for Handling Missing Modalities

Aref Farhadipour; Teodora Vukovic; Volker Dellwo; Petr Motlicek; Srikanth Madikeri

arXiv:2512.14961·cs.CV·January 28, 2026

Adaptive Multimodal Person Recognition: A Robust Framework for Handling Missing Modalities

Aref Farhadipour, Teodora Vukovic, Volker Dellwo, Petr Motlicek, Srikanth Madikeri

PDF

Open Access

TL;DR

This paper introduces a robust multimodal person recognition framework that effectively handles missing modalities by combining feature-level and score-level fusion, achieving high accuracy on multiple datasets including a new interview-based dataset.

Contribution

The work presents a novel hybrid fusion strategy with dynamic adaptation mechanisms, and introduces a new dataset for benchmarking multimodal person recognition in interview scenarios.

Findings

01

Achieves 99.51% accuracy on CANDOR dataset

02

Reaches 99.92% accuracy on VoxCeleb1 in bimodal mode

03

Maintains high accuracy with missing modalities

Abstract

Person identification systems often rely on audio, visual, or behavioral cues, but real-world conditions frequently present with missing or degraded modalities. To address this challenge, we propose a multimodal person identification framework incorporating upper-body motion, face, and voice. Experimental results demonstrate that body motion outperforms traditional modalities such as face and voice in within-session evaluations, while serving as a complementary cue that enhances performance in multi-session scenarios. Our model employs a unified hybrid fusion strategy, fusing both feature-level and score-level information to maximize representational richness and decision accuracy. Specifically, it leverages multi-task learning to process modalities independently, followed by cross-attention and gated fusion mechanisms to exploit both unimodal information and cross-modal interactions.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Gait Recognition and Analysis