TL;DR
This paper introduces an audiovisual deep residual network that predicts Big Five personality traits directly from videos without feature engineering, achieving high accuracy and winning third place in a challenge.
Contribution
The novel end-to-end deep residual network enables multimodal personality trait recognition without relying on traditional visual preprocessing steps.
Findings
Achieved 0.9109 accuracy in personality prediction
Won third place in the ChaLearn First Impressions Challenge
Operates without feature engineering or facial analysis
Abstract
Here, we develop an audiovisual deep residual network for multimodal apparent personality trait recognition. The network is trained end-to-end for predicting the Big Five personality traits of people from their videos. That is, the network does not require any feature engineering or visual analysis such as face detection, face landmark alignment or facial expression recognition. Recently, the network won the third place in the ChaLearn First Impressions Challenge with a test accuracy of 0.9109.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
