Multi-View Region Adaptive Multi-temporal DMM and RGB Action Recognition
Mahmoud Al-Faris, John P. Chiverton, Yanyan Yang, David L. Ndzi

TL;DR
This paper introduces a multi-view, multi-resolution deep learning framework for human action recognition that combines depth motion maps and RGB appearance data, achieving robust multi-view and multi-resolution performance.
Contribution
It proposes a novel multi-view, multi-resolution Depth Motion Map formulation combined with appearance information and multi-stream 3D CNNs for improved action recognition.
Findings
Outperforms state-of-the-art algorithms on public datasets.
Demonstrates robustness to view variations and small object interactions.
Effectively recognizes human actions and human-object interactions.
Abstract
Human action recognition remains an important yet challenging task. This work proposes a novel action recognition system. It uses a novel Multiple View Region Adaptive Multi-resolution in time Depth Motion Map (MV-RAMDMM) formulation combined with appearance information. Multiple stream 3D Convolutional Neural Networks (CNNs) are trained on the different views and time resolutions of the region adaptive Depth Motion Maps. Multiple views are synthesised to enhance the view invariance. The region adaptive weights, based on localised motion, accentuate and differentiate parts of actions possessing faster motion. Dedicated 3D CNN streams for multi-time resolution appearance information (RGB) are also included. These help to identify and differentiate between small object interactions. A pre-trained 3D-CNN is used here with fine-tuning for each stream along with multiple class Support Vector…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
