SMV-EAR: Bring Spatiotemporal Multi-View Representation Learning into Efficient Event-Based Action Recognition

Rui Fan; Weidong Hao

arXiv:2601.17391·cs.CV·January 27, 2026

SMV-EAR: Bring Spatiotemporal Multi-View Representation Learning into Efficient Event-Based Action Recognition

Rui Fan, Weidong Hao

PDF

Open Access

TL;DR

This paper introduces SMV-EAR, a novel spatiotemporal multi-view representation learning framework for event-based action recognition that improves accuracy and efficiency by addressing limitations of previous methods.

Contribution

It proposes a translation-invariant dense event conversion, a dual-branch fusion architecture, and a bio-inspired temporal augmentation, advancing the state-of-the-art in event-based action recognition.

Findings

01

Achieved over 7-10% accuracy improvements on three datasets.

02

Reduced model parameters by 30.1% and computations by 35.7%.

03

Established a new effective paradigm for event-based action recognition.

Abstract

Event cameras action recognition (EAR) offers compelling privacy-protecting and efficiency advantages, where temporal motion dynamics is of great importance. Existing spatiotemporal multi-view representation learning (SMVRL) methods for event-based object recognition (EOR) offer promising solutions by projecting H-W-T events along spatial axis H and W, yet are limited by its translation-variant spatial binning representation and naive early concatenation fusion architecture. This paper reexamines the key SMVRL design stages for EAR and propose: (i) a principled spatiotemporal multi-view representation through translation-invariant dense conversion of sparse events, (ii) a dual-branch, dynamic fusion architecture that models sample-wise complementarity between motion features from different views, and (iii) a bio-inspired temporal warping augmentation that mimics speed variability of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Context-Aware Activity Recognition Systems · Reinforcement Learning in Robotics