Discriminatively Trained Latent Ordinal Model for Video Classification
Karan Sikka, Gaurav Sharma

TL;DR
This paper introduces a weakly supervised latent ordinal model for video classification that automatically mines discriminative sub-events, improving performance on facial analysis and human action recognition datasets.
Contribution
It extends existing multiple instance learning frameworks to model the ordinal structure in videos, providing a novel approach for weakly supervised video classification.
Findings
Consistent improvements over baselines on four facial analysis datasets.
Effective modeling of sub-events like onset and offset phases.
Qualitative results support the method's intuitions.
Abstract
We study the problem of video classification for facial analysis and human action recognition. We propose a novel weakly supervised learning method that models the video as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for "smile", running and jumping for "highjump"). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF -- it extends such frameworks to model the ordinal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations and on three challenging human action datasets. We also validate the method with qualitative results and show that they largely support the intuitions behind the method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
