Learning Grammar of Complex Activities via Deep Neural Networks

Becky Mashaido

arXiv:2101.02774·cs.CV·January 11, 2021

Learning Grammar of Complex Activities via Deep Neural Networks

Becky Mashaido

PDF

Open Access

TL;DR

This paper explores how deep neural networks can learn the grammar of complex activities in videos, providing theoretical insights and proposing mechanisms to enhance model performance in continuous video analysis.

Contribution

It offers a theoretical framework for deep neural networks in video learning under label constraints and suggests improvements based on observed model behaviors.

Findings

01

Insights into model performance on video data

02

Proposed mechanisms to improve learning of activity sequences

03

Theoretical understanding of neural networks in video analysis

Abstract

Motivated by the growing amount of publicly available video data on online streaming services and an increased interest in applications that analyze continuous video streams such as autonomous driving, this technical report provides a theoretical insight into deep neural networks for video learning, under label constraints. I build upon previous work in video learning for computer vision, make observations on model performance and propose further mechanisms to help improve our observations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Machine Learning and Data Classification