SBF: An Effective Representation to Augment Skeleton for Video-based Human Action Recognition
Zhuoxuan Peng, Yiyi Ding, Yang Lin, S.-H. Gary Chan

TL;DR
This paper introduces SBF, a new representation combining scale, body outline, and flow information to improve video-based human action recognition beyond skeleton-only methods.
Contribution
The authors propose SBF, a novel augmented skeleton representation, and SFSNet, a segmentation network, to enhance HAR accuracy without extra annotation overhead.
Findings
SBF significantly improves HAR accuracy across multiple datasets.
SFSNet effectively predicts SBF components using existing annotations.
The pipeline maintains efficiency comparable to skeleton-only approaches.
Abstract
Many modern video-based human action recognition (HAR) approaches use 2D skeleton as the intermediate representation in their prediction pipelines. Despite overall encouraging results, these approaches still struggle in many common scenes, mainly because the skeleton does not capture critical action-related information pertaining to the depth of the joints, contour of the human body, and interaction between the human and objects. To address this, we propose an effective approach to augment skeleton with a representation capturing action-related information in the pipeline of HAR. The representation, termed Scale-Body-Flow (SBF), consists of three distinct components, namely a scale map volume given by the scale (and hence depth information) of each joint, a body map outlining the human subject, and a flow map indicating human-object interaction given by pixel-wise optical flow values.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
