Three Branches: Detecting Actions With Richer Features

Jin Xia; Jiajun Tang; Cewu Lu

arXiv:1908.04519·cs.CV·August 14, 2019·6 cites

Three Branches: Detecting Actions With Richer Features

Jin Xia, Jiajun Tang, Cewu Lu

PDF

Open Access

TL;DR

This paper introduces a three-branch model for action recognition that fuses global video, human attention, and long-term activity information, achieving state-of-the-art results in CVPR2019 challenges.

Contribution

The novel three-branch architecture effectively combines multiple levels of information for improved action recognition performance.

Findings

01

Achieved 21.59% error rate on Kinetics challenge.

02

Obtained 32.49% mAP on AVA challenge, outperforming previous submissions.

03

Demonstrated the effectiveness of multi-level feature fusion in action recognition.

Abstract

We present our three branch solutions for International Challenge on Activity Recognition at CVPR2019. This model seeks to fuse richer information of global video clip, short human attention and long-term human activity into a unified model. We have participated in two tasks: Task A, the Kinetics challenge and Task B, spatio-temporal action localization challenge. For Kinetics, we achieve 21.59% error rate. For the AVA challenge, our final model obtains 32.49% mAP on the test sets, which outperforms all submissions to the AVA challenge at CVPR 2018 for more than 10% mAP. As the future work, we will introduce human activity knowledge, which is a new dataset including key information of human activity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications