Follow the Attention: Combining Partial Pose and Object Motion for Fine-Grained Action Detection
Mohammad Mahdi Kazemi Moghaddam, Ehsan Abbasnejad, Javen Shi

TL;DR
This paper presents a multi-stream neural network framework that combines partial human pose and object motion, guided by a spatiotemporal attention mechanism, to improve fine-grained activity detection in retail environments.
Contribution
It introduces a novel integration of human pose and object motion with attention mechanisms, utilizing GANs for pose estimation without supervision, and demonstrates state-of-the-art results on shopping datasets.
Findings
Achieved state-of-the-art results on MERL shopping dataset.
Incorporating object motion improves activity recognition accuracy.
GAN-based pose estimation effectively replaces supervised pose annotations.
Abstract
Retailers have long been searching for ways to effectively understand their customers' behaviour in order to provide a smooth and pleasant shopping experience that attracts more customers everyday and maximises their revenue, consequently. Humans can flawlessly understand others' behaviour by combining different visual cues from activity to gestures and facial expressions. Empowering the computer vision systems to do so, however, is still an open problem due to its intrinsic challenges as well as extrinsic enforced difficulties like lack of publicly available data and unique environment conditions (wild). In this work, We emphasise on detecting the first and by far the most crucial cue in behaviour analysis; that is human activity detection in computer vision. To do so, we introduce a framework for integrating human pose and object motion to both temporally detect and classify the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Surveillance and Tracking Methods
MethodsModel-Agnostic Meta-Learning · Meta Reward Learning
