Flexible-modal Deception Detection with Audio-Visual Adapter
Zhaoxu Li, Zitong Yu, Nithish Muthuchamy Selvaraj, Xiaobao Guo,, Bingquan Shen, Adams Wai-Kin Kong, Alex Kot

TL;DR
This paper introduces a Transformer-based framework with an Audio-Visual Adapter to improve deception detection accuracy in multi-modal settings, especially when some modalities are missing, by effectively fusing audio and visual features.
Contribution
It proposes a novel AVA module within a Transformer framework to handle flexible-modal data and missing modalities, enhancing deception detection performance.
Findings
Outperforms existing multi-modal fusion methods on benchmark datasets.
Effectively handles partial modality availability in real-world scenarios.
Achieves superior accuracy in deception detection tasks.
Abstract
Detecting deception by human behaviors is vital in many fields such as custom security and multimedia anti-fraud. Recently, audio-visual deception detection attracts more attention due to its better performance than using only a single modality. However, in real-world multi-modal settings, the integrity of data can be an issue (e.g., sometimes only partial modalities are available). The missing modality might lead to a decrease in performance, but the model still learns the features of the missed modality. In this paper, to further improve the performance and overcome the missing modality problem, we propose a novel Transformer-based framework with an Audio-Visual Adapter (AVA) to fuse temporal features across two modalities efficiently. Extensive experiments conducted on two benchmark datasets demonstrate that the proposed method can achieve superior performance compared with other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Anomaly Detection Techniques and Applications · Speech and Audio Processing
MethodsAdapter
