Follow the Attention: Combining Partial Pose and Object Motion for   Fine-Grained Action Detection

Mohammad Mahdi Kazemi Moghaddam; Ehsan Abbasnejad; Javen Shi

arXiv:1905.04430·cs.CV·June 27, 2019·1 cites

Follow the Attention: Combining Partial Pose and Object Motion for Fine-Grained Action Detection

Mohammad Mahdi Kazemi Moghaddam, Ehsan Abbasnejad, Javen Shi

PDF

Open Access

TL;DR

This paper presents a multi-stream neural network framework that combines partial human pose and object motion, guided by a spatiotemporal attention mechanism, to improve fine-grained activity detection in retail environments.

Contribution

It introduces a novel integration of human pose and object motion with attention mechanisms, utilizing GANs for pose estimation without supervision, and demonstrates state-of-the-art results on shopping datasets.

Findings

01

Achieved state-of-the-art results on MERL shopping dataset.

02

Incorporating object motion improves activity recognition accuracy.

03

GAN-based pose estimation effectively replaces supervised pose annotations.

Abstract

Retailers have long been searching for ways to effectively understand their customers' behaviour in order to provide a smooth and pleasant shopping experience that attracts more customers everyday and maximises their revenue, consequently. Humans can flawlessly understand others' behaviour by combining different visual cues from activity to gestures and facial expressions. Empowering the computer vision systems to do so, however, is still an open problem due to its intrinsic challenges as well as extrinsic enforced difficulties like lack of publicly available data and unique environment conditions (wild). In this work, We emphasise on detecting the first and by far the most crucial cue in behaviour analysis; that is human activity detection in computer vision. To do so, we introduce a framework for integrating human pose and object motion to both temporally detect and classify the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Surveillance and Tracking Methods

MethodsModel-Agnostic Meta-Learning · Meta Reward Learning