View-invariant Deep Architecture for Human Action Recognition using late fusion
Chhavi Dhiman, Dinesh Kumar Vishwakarma

TL;DR
This paper introduces a view-invariant deep learning framework for human action recognition that combines motion and shape dynamics, utilizing late fusion of multiple streams, and demonstrates superior performance on multiple benchmarks.
Contribution
A novel view-invariant deep architecture integrating motion and shape dynamics with late fusion for improved action recognition accuracy.
Findings
Outperforms existing state-of-the-art methods significantly.
Effective in cross-view and cross-subject validation schemes.
Achieves higher accuracy, ROC, and AUC metrics.
Abstract
Human action Recognition for unknown views is a challenging task. We propose a view-invariant deep human action recognition framework, which is a novel integration of two important action cues: motion and shape temporal dynamics (STD). The motion stream encapsulates the motion content of action as RGB Dynamic Images (RGB-DIs) which are processed by the fine-tuned InceptionV3 model. The STD stream learns long-term view-invariant shape dynamics of action using human pose model (HPM) based view-invariant features mined from structural similarity index matrix (SSIM) based key depth human pose frames. To predict the score of the test sample, three types of late fusion (maximum, average and product) techniques are applied on individual stream scores. To validate the performance of the proposed novel framework the experiments are performed using both cross subject and cross-view validation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTest
