Learning Spatio-temporal Features with Partial Expression Sequences for on-the-Fly Prediction
Wissam J. Baddar, Yong Man Ro

TL;DR
This paper introduces a novel spatio-temporal feature learning approach that enables real-time facial expression prediction from partial video sequences, reducing delays in interactive systems.
Contribution
The proposed method allows on-the-fly facial expression prediction using partial sequences by leveraging estimated expression intensity and a new training objective.
Findings
Achieved higher recognition rates than state-of-the-art methods.
Improved prediction accuracy with partial expression sequences.
Validated effectiveness on multiple datasets.
Abstract
Spatio-temporal feature encoding is essential for encoding facial expression dynamics in video sequences. At test time, most spatio-temporal encoding methods assume that a temporally segmented sequence is fed to a learned model, which could require the prediction to wait until the full sequence is available to an auxiliary task that performs the temporal segmentation. This causes a delay in predicting the expression. In an interactive setting, such as affective interactive agents, such delay in the prediction could not be tolerated. Therefore, training a model that can accurately predict the facial expression "on-the-fly" (as they are fed to the system) is essential. In this paper, we propose a new spatio-temporal feature learning method, which would allow prediction with partial sequences. As such, the prediction could be performed on-the-fly. The proposed method utilizes an estimated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition
