Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action   Recognition

Tianjiao Li; Lin Geng Foo; Qiuhong Ke; Hossein Rahmani; Anran Wang,; Jinghua Wang; Jun Liu

arXiv:2209.01425·cs.CV·September 7, 2022·1 cites

Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition

Tianjiao Li, Lin Geng Foo, Qiuhong Ke, Hossein Rahmani, Anran Wang,, Jinghua Wang, Jun Liu

PDF

Open Access

TL;DR

This paper introduces a novel Dynamic Spatio-Temporal Specialization (DSTS) module inspired by the human visual system, which enhances fine-grained action recognition by learning specialized neuron activations for subtle differences in video data.

Contribution

The paper proposes a new DSTS module with a spatio-temporal specialization method and an Upstream-Downstream Learning algorithm to improve fine-grained action recognition performance.

Findings

01

Achieved state-of-the-art results on two fine-grained action datasets.

02

Demonstrated improved discrimination of subtle action differences.

03

Enhanced model adaptability through dynamic specialization mechanisms.

Abstract

The goal of fine-grained action recognition is to successfully discriminate between action categories with subtle differences. To tackle this, we derive inspiration from the human visual system which contains specialized regions in the brain that are dedicated towards handling specific tasks. We design a novel Dynamic Spatio-Temporal Specialization (DSTS) module, which consists of specialized neurons that are only activated for a subset of samples that are highly similar. During training, the loss forces the specialized neurons to learn discriminative fine-grained differences to distinguish between these similar samples, improving fine-grained recognition. Moreover, a spatio-temporal specialization method further optimizes the architectures of the specialized neurons to capture either more spatial or temporal fine-grained information, to better tackle the large range of spatio-temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning