An End-to-end 3D Convolutional Neural Network for Action Detection and   Segmentation in Videos

Rui Hou; Chen Chen; Mubarak Shah

arXiv:1712.01111·cs.CV·December 5, 2017·41 cites

An End-to-end 3D Convolutional Neural Network for Action Detection and Segmentation in Videos

Rui Hou, Chen Chen, Mubarak Shah

PDF

Open Access

TL;DR

This paper introduces an end-to-end 3D CNN architecture for action detection and segmentation in videos, combining top-down tube proposal linking with bottom-up segmentation to improve accuracy and reduce reliance on extensive annotations.

Contribution

The paper presents a unified 3D CNN model that integrates action detection and segmentation, enhancing performance and reducing annotation dependency compared to previous methods.

Findings

01

Outperforms state-of-the-art on multiple video datasets

02

Effectively combines top-down and bottom-up approaches

03

Reduces need for large annotated datasets

Abstract

In this paper, we propose an end-to-end 3D CNN for action detection and segmentation in videos. The proposed architecture is a unified deep network that is able to recognize and localize action based on 3D convolution features. A video is first divided into equal length clips and next for each clip a set of tube proposals are generated based on 3D CNN features. Finally, the tube proposals of different clips are linked together and spatio-temporal action detection is performed using these linked video proposals. This top-down action detection approach explicitly relies on a set of good tube proposals to perform well and training the bounding box regression usually requires a large number of annotated samples. To remedy this, we further extend the 3D CNN to an encoder-decoder structure and formulate the localization problem as action segmentation. The foreground regions (i.e. action…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Anomaly Detection Techniques and Applications

Methods3D Convolution · Convolution