Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition
Tianjiao Li, Lin Geng Foo, Qiuhong Ke, Hossein Rahmani, Anran Wang,, Jinghua Wang, Jun Liu

TL;DR
This paper introduces a novel Dynamic Spatio-Temporal Specialization (DSTS) module inspired by the human visual system, which enhances fine-grained action recognition by learning specialized neuron activations for subtle differences in video data.
Contribution
The paper proposes a new DSTS module with a spatio-temporal specialization method and an Upstream-Downstream Learning algorithm to improve fine-grained action recognition performance.
Findings
Achieved state-of-the-art results on two fine-grained action datasets.
Demonstrated improved discrimination of subtle action differences.
Enhanced model adaptability through dynamic specialization mechanisms.
Abstract
The goal of fine-grained action recognition is to successfully discriminate between action categories with subtle differences. To tackle this, we derive inspiration from the human visual system which contains specialized regions in the brain that are dedicated towards handling specific tasks. We design a novel Dynamic Spatio-Temporal Specialization (DSTS) module, which consists of specialized neurons that are only activated for a subset of samples that are highly similar. During training, the loss forces the specialized neurons to learn discriminative fine-grained differences to distinguish between these similar samples, improving fine-grained recognition. Moreover, a spatio-temporal specialization method further optimizes the architectures of the specialized neurons to capture either more spatial or temporal fine-grained information, to better tackle the large range of spatio-temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
