Part-level Action Parsing via a Pose-guided Coarse-to-Fine Framework

Xiaodong Chen; Xinchen Liu; Wu Liu; Kun Liu; Dong Wu; Yongdong Zhang,; Tao Mei

arXiv:2203.04476·cs.CV·September 5, 2022

Part-level Action Parsing via a Pose-guided Coarse-to-Fine Framework

Xiaodong Chen, Xinchen Liu, Wu Liu, Kun Liu, Dong Wu, Yongdong Zhang,, Tao Mei

PDF

Open Access

TL;DR

This paper introduces a pose-guided, coarse-to-fine framework for Part-level Action Parsing that predicts both video-level actions and frame-level body part actions, achieving state-of-the-art results on Kinetics-TPS.

Contribution

It proposes a novel pose-guided positional embedding and segment-level feature recognition for accurate, explainable part-level action parsing in videos.

Findings

01

Achieves 31.10% ROC score on Kinetics-TPS

02

Outperforms existing methods significantly

03

Balances accuracy and computation effectively

Abstract

Action recognition from videos, i.e., classifying a video into one of the pre-defined action types, has been a popular topic in the communities of artificial intelligence, multimedia, and signal processing. However, existing methods usually consider an input video as a whole and learn models, e.g., Convolutional Neural Networks (CNNs), with coarse video-level class labels. These methods can only output an action class for the video, but cannot provide fine-grained and explainable cues to answer why the video shows a specific action. Therefore, researchers start to focus on a new task, Part-level Action Parsing (PAP), which aims to not only predict the video-level action but also recognize the frame-level fine-grained actions or interactions of body parts for each person in the video. To this end, we propose a coarse-to-fine framework for this challenging task. In particular, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Hand Gesture Recognition Systems