TL;DR
This paper introduces LOMo, a weakly supervised learning method that models sequences of sub-events in videos for facial analysis, improving accuracy in expression, pain, and intent detection.
Contribution
LOMo extends existing MIL and latent SVM/HCRF frameworks to incorporate ordinal and temporal information for better video-based facial analysis.
Findings
Achieved state-of-the-art results on four challenging datasets.
Demonstrated consistent improvements over relevant baselines.
Effectively modeled sub-events like onset and offset in facial expressions.
Abstract
We study the problem of facial analysis in videos. We propose a novel weakly supervised learning method that models the video event (expression, pain etc.) as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for smile, brow lower and cheek raise for pain). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF- it extends such frameworks to model the ordinal or temporal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations. In combination with complimentary features, we report state-of-the-art results on these datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
LOMo: Latent Ordinal Model for Facial Analysis in Videos· youtube
