Spatio-Temporal Covariance Descriptors for Action and Gesture Recognition
Andres Sanin, Conrad Sanderson, Mehrtash T. Harandi, Brian C. Lovell

TL;DR
This paper introduces a novel action and gesture recognition approach using spatio-temporal covariance descriptors and a Riemannian projection, achieving superior performance without complex preprocessing.
Contribution
It presents a new method combining covariance descriptors with a weighted Riemannian projection and boosting, enabling effective recognition without additional video processing.
Findings
Outperforms recent state-of-the-art techniques
Robust to various video conditions
Efficient computation via integral video representations
Abstract
We propose a new action and gesture recognition method based on spatio-temporal covariance descriptors and a weighted Riemannian locality preserving projection approach that takes into account the curved space formed by the descriptors. The weighted projection is then exploited during boosting to create a final multiclass classification algorithm that employs the most useful spatio-temporal regions. We also show how the descriptors can be computed quickly through the use of integral video representations. Experiments on the UCF sport, CK+ facial expression and Cambridge hand gesture datasets indicate superior performance of the proposed method compared to several recent state-of-the-art techniques. The proposed method is robust and does not require additional processing of the videos, such as foreground detection, interest-point detection or tracking.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
