Efficient Human Vision Inspired Action Recognition using Adaptive   Spatiotemporal Sampling

Khoi-Nguyen C. Mac; Minh N. Do; Minh P. Vo

arXiv:2207.05249·cs.CV·July 18, 2022

Efficient Human Vision Inspired Action Recognition using Adaptive Spatiotemporal Sampling

Khoi-Nguyen C. Mac, Minh N. Do, Minh P. Vo

PDF

Open Access 1 Repo

TL;DR

This paper introduces a human vision-inspired adaptive spatiotemporal sampling method for efficient action recognition on wearable devices, significantly improving speed with minimal accuracy loss.

Contribution

It proposes a novel context-aware sampling scheme inspired by human visual perception, enhancing efficiency over fixed sampling strategies.

Findings

01

Speeds up inference significantly

02

Maintains accuracy with minimal loss

03

Validated on EPIC-KITCHENS and UCF-101 datasets

Abstract

Adaptive sampling that exploits the spatiotemporal redundancy in videos is critical for always-on action recognition on wearable devices with limited computing and battery resources. The commonly used fixed sampling strategy is not context-aware and may under-sample the visual content, and thus adversely impacts both computation efficiency and accuracy. Inspired by the concepts of foveal vision and pre-attentive processing from the human visual perception mechanism, we introduce a novel adaptive spatiotemporal sampling scheme for efficient action recognition. Our system pre-scans the global scene context at low-resolution and decides to skip or request high-resolution features at salient regions for further processing. We validate the system on EPIC-KITCHENS and UCF-101 datasets for action recognition, and show that our proposed approach can greatly speed up inference with a tolerable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

knmac/adaptive_spatiotemporal
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Stroke Rehabilitation and Recovery · Advanced Technologies in Various Fields

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings