Listening to the Unspoken: Exploring "365" Aspects of Multimodal Interview Performance Assessment

Jia Li; Yang Wang; Wenhao Qian; Jialong Hu; Zhenzhen Hu; Richang Hong; Meng Wang

arXiv:2507.22676·cs.CL·August 6, 2025

Listening to the Unspoken: Exploring "365" Aspects of Multimodal Interview Performance Assessment

Jia Li, Yang Wang, Wenhao Qian, Jialong Hu, Zhenzhen Hu, Richang Hong, Meng Wang

PDF

TL;DR

This paper introduces a comprehensive multimodal framework for interview performance assessment that integrates video, audio, and text data across multiple responses and evaluation dimensions, achieving state-of-the-art results in the AVI Challenge 2025.

Contribution

It presents a novel multimodal assessment framework with modality-specific feature extraction, shared compression, and ensemble learning, advancing automated interview evaluation methods.

Findings

01

Achieved a multi-dimensional average MSE of 0.1824.

02

Secured first place in the AVI Challenge 2025.

03

Demonstrated robustness and effectiveness in multimodal assessment.

Abstract

Interview performance assessment is essential for determining candidates' suitability for professional positions. To ensure holistic and fair evaluations, we propose a novel and comprehensive framework that explores ``365'' aspects of interview performance by integrating \textit{three} modalities (video, audio, and text), \textit{six} responses per candidate, and \textit{five} key evaluation dimensions. The framework employs modality-specific feature extractors to encode heterogeneous data streams and subsequently fused via a Shared Compression Multilayer Perceptron. This module compresses multimodal embeddings into a unified latent space, facilitating efficient feature interaction. To enhance prediction robustness, we incorporate a two-level ensemble learning strategy: (1) independent regression heads predict scores for each response, and (2) predictions are aggregated across responses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.