Robust 3D Action Recognition through Sampling Local Appearances and Global Distributions
Mengyuan Liu, Hong Liu, Chen Chen

TL;DR
This paper introduces a robust 3D action recognition method that combines local appearance and global distribution cues using a two-layer BoVW model, effectively handling noise and similar actions.
Contribution
A novel two-layer BoVW model that jointly encodes motion and shape cues with new descriptors, improving robustness and accuracy in 3D action recognition.
Findings
Outperforms common STIP detection methods.
Effective in distinguishing similar actions.
Robust to background clutter and noise.
Abstract
3D action recognition has broad applications in human-computer interaction and intelligent surveillance. However, recognizing similar actions remains challenging since previous literature fails to capture motion and shape cues effectively from noisy depth data. In this paper, we propose a novel two-layer Bag-of-Visual-Words (BoVW) model, which suppresses the noise disturbances and jointly encodes both motion and shape cues. First, background clutter is removed by a background modeling method that is designed for depth data. Then, motion and shape cues are jointly used to generate robust and distinctive spatial-temporal interest points (STIPs): motion-based STIPs and shape-based STIPs. In the first layer of our model, a multi-scale 3D local steering kernel (M3DLSK) descriptor is proposed to describe local appearances of cuboids around motion-based STIPs. In the second layer, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Gait Recognition and Analysis
