Micro-DualNet: Dual-Path Spatio-Temporal Network for Micro-Action Recognition
Naga VS Raviteja Chappa, Evangelos Sariyanidi, Lisa Yankowitz, Gokul Nair, Casey J. Zampella, Robert T. Schultz, Birkan Tun\c{c}

TL;DR
Micro-DualNet introduces a dual-path spatio-temporal network with adaptive routing for improved recognition of subtle, localized micro-actions in videos, addressing diverse spatio-temporal characteristics.
Contribution
The paper proposes a novel dual-path architecture with entity-level adaptive routing and MAC loss to better model diverse micro-actions, advancing fine-grained video understanding.
Findings
Achieves competitive results on MA-52 dataset.
Sets new state-of-the-art on iMiGUE dataset.
Demonstrates the importance of architectural adaptation for micro-action recognition.
Abstract
Micro-actions are subtle, localized movements lasting 1-3 seconds such as scratching one's head or tapping fingers. Such subtle actions are essential for social communication, ubiquitously used in natural interactions, and thus critical for fine-grained video understanding, yet remain poorly understood by current computer vision systems. We identify a fundamental challenge: micro-actions exhibit diverse spatio-temporal characteristics where some are defined by spatial configurations while others manifest through temporal dynamics. Existing methods that commit to a single spatio-temporal decomposition cannot accommodate this diversity. We propose a dual-path network that processes anatomically-grounded spatial entities through parallel Spatial-Temporal (ST) and Temporal-Spatial (TS) pathways. The ST path captures spatial configurations before modeling temporal dynamics, while the TS path…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
