Learning Grammar of Complex Activities via Deep Neural Networks
Becky Mashaido

TL;DR
This paper explores how deep neural networks can learn the grammar of complex activities in videos, providing theoretical insights and proposing mechanisms to enhance model performance in continuous video analysis.
Contribution
It offers a theoretical framework for deep neural networks in video learning under label constraints and suggests improvements based on observed model behaviors.
Findings
Insights into model performance on video data
Proposed mechanisms to improve learning of activity sequences
Theoretical understanding of neural networks in video analysis
Abstract
Motivated by the growing amount of publicly available video data on online streaming services and an increased interest in applications that analyze continuous video streams such as autonomous driving, this technical report provides a theoretical insight into deep neural networks for video learning, under label constraints. I build upon previous work in video learning for computer vision, make observations on model performance and propose further mechanisms to help improve our observations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Machine Learning and Data Classification
