A Study of Multimodal Person Verification Using Audio-Visual-Thermal Data
Madina Abdrakhmanova, Saniya Abushakimova, Yerbolat Khassanov, and, Huseyin Atakan Varol

TL;DR
This study explores the effectiveness of combining audio, visual, and thermal data for robust person verification, demonstrating that a trimodal approach significantly outperforms unimodal and bimodal systems under various conditions.
Contribution
It introduces a trimodal verification system using deep learning, comparing fusion methods, and provides extensive experimental results on the SpeakingFaces dataset.
Findings
Trimodal system outperforms unimodal and bimodal systems by over 50% and 18% in error reduction.
Adding thermal modality improves robustness under noisy conditions.
Open-source code and models facilitate reproducibility and further research.
Abstract
In this paper, we study an approach to multimodal person verification using audio, visual, and thermal modalities. The combination of audio and visual modalities has already been shown to be effective for robust person verification. From this perspective, we investigate the impact of further increasing the number of modalities by adding thermal images. In particular, we implemented unimodal, bimodal, and trimodal verification systems using state-of-the-art deep learning architectures and compared their performance under clean and noisy conditions. We also compared two popular fusion approaches based on simple score averaging and the soft attention mechanism. The experiment conducted on the SpeakingFaces dataset demonstrates the superior performance of the trimodal verification system. Specifically, on the easy test set, the trimodal system outperforms the best unimodal and bimodal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Speech and Audio Processing · Hand Gesture Recognition Systems
MethodsTest
