Toward Accurate Person-level Action Recognition in Videos of Crowded   Scenes

Li Yuan; Yichen Zhou; Shuning Chang; Ziyuan Huang; Yunpeng Chen,; Xuecheng Nie; Tao Wang; Jiashi Feng; Shuicheng Yan

arXiv:2010.08365·cs.CV·October 19, 2020

Toward Accurate Person-level Action Recognition in Videos of Crowded Scenes

Li Yuan, Yichen Zhou, Shuning Chang, Ziyuan Huang, Yunpeng Chen,, Xuecheng Nie, Tao Wang, Jiashi Feng, Shuicheng Yan

PDF

TL;DR

This paper advances person-level action recognition in crowded videos by integrating scene information and new diverse data, significantly improving accuracy and generalization in complex environments.

Contribution

It introduces a top-down approach combining strong human detection, semantic scene segmentation, and new data collection to enhance recognition in crowded scenes.

Findings

01

Achieved 26.05 wf_mAP on the HIE dataset.

02

Ranked 1st in ACM MM 2020 Human in Events challenge.

03

Enhanced model generalization with diverse internet data.

Abstract

Detecting and recognizing human action in videos with crowded scenes is a challenging problem due to the complex environment and diversity events. Prior works always fail to deal with this problem in two aspects: (1) lacking utilizing information of the scenes; (2) lacking training data in the crowd and complex scenes. In this paper, we focus on improving spatio-temporal action recognition by fully-utilizing the information of scenes and collecting new data. A top-down strategy is used to overcome the limitations. Specifically, we adopt a strong human detector to detect the spatial location of each frame. We then apply action recognition models to learn the spatio-temporal information from video frames on both the HIE dataset and new data with diverse scenes from the internet, which can improve the generalization ability of our model. Besides, the scenes information is extracted by the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.