Combined Static and Motion Features for Deep-Networks Based Activity Recognition in Videos
Sameera Ramasinghe, Jathushan Rajasegaran, Vinoj Jayasundara, Kanchana, Ranasinghe, Ranga Rodrigo, Ajith A. Pasqual

TL;DR
This paper introduces three methods for combining static and motion features in deep learning-based video activity recognition, enabling better control and understanding of feature contributions, and achieving competitive results on popular datasets.
Contribution
It proposes three novel schemas for combining static and motion features, including a Cholesky decomposition approach that allows control over their contributions.
Findings
Cholesky-based method effectively controls feature contribution.
Optimal static-motion ratio aligns with variance analysis.
System achieves state-of-the-art or comparable performance.
Abstract
Activity recognition in videos in a deep-learning setting---or otherwise---uses both static and pre-computed motion components. The method of combining the two components, whilst keeping the burden on the deep network less, still remains uninvestigated. Moreover, it is not clear what the level of contribution of individual components is, and how to control the contribution. In this work, we use a combination of CNN-generated static features and motion features in the form of motion tubes. We propose three schemas for combining static and motion components: based on a variance ratio, principal components, and Cholesky decomposition. The Cholesky decomposition based method allows the control of contributions. The ratio given by variance analysis of static and motion features match well with the experimental optimal ratio used in the Cholesky decomposition based method. The resulting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
