Investigating Audio, Visual, and Text Fusion Methods for End-to-End   Automatic Personality Prediction

Onno Kampman; Elham J. Barezi; Dario Bertero; Pascale Fung

arXiv:1805.00705·cs.AI·May 17, 2018·25 cites

Investigating Audio, Visual, and Text Fusion Methods for End-to-End Automatic Personality Prediction

Onno Kampman, Elham J. Barezi, Dario Bertero, Pascale Fung

PDF

Open Access

TL;DR

This paper introduces a tri-modal neural network architecture that combines audio, visual, and text data to improve automatic personality prediction from videos, outperforming single-modality models.

Contribution

The paper presents a novel multimodal fusion approach with decision-level and feature concatenation methods, demonstrating superior performance over individual modalities.

Findings

01

Multimodal fusion improves prediction accuracy by 9.4% over best single modality.

02

Full backpropagation enhances model performance compared to linear combination.

03

Each modality's relevance varies across different personality traits.

Abstract

We propose a tri-modal architecture to predict Big Five personality trait scores from video clips with different channels for audio, text, and video data. For each channel, stacked Convolutional Neural Networks are employed. The channels are fused both on decision-level and by concatenating their respective fully connected layers. It is shown that a multimodal fusion approach outperforms each single modality channel, with an improvement of 9.4\% over the best individual modality (video). Full backpropagation is also shown to be better than a linear combination of modalities, meaning complex interactions between modalities can be leveraged to build better models. Furthermore, we can see the prediction relevance of each modality for each trait. The described model can be used to increase the emotional intelligence of virtual agents.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods