SITAR: Semi-supervised Image Transformer for Action Recognition

Owais Iqbal; Omprakash Chakraborty; Aftab Hussain; Rameswar; Panda; Abir Das

arXiv:2409.02910·cs.CV·September 5, 2024

SITAR: Semi-supervised Image Transformer for Action Recognition

Owais Iqbal, Omprakash Chakraborty, Aftab Hussain, Rameswar, Panda, Abir Das

PDF

Open Access

TL;DR

This paper introduces SITAR, a semi-supervised image transformer approach for action recognition that efficiently leverages limited labeled videos and unlabeled data using contrastive learning on super images, achieving high accuracy with reduced computation.

Contribution

SITAR is a novel semi-supervised method that rearranges video frames into super images and employs contrastive learning with a 2D transformer, reducing computational costs while improving performance.

Findings

01

Outperforms state-of-the-art semi-supervised methods on benchmark datasets.

02

Reduces computational complexity compared to 3D transformer models.

03

Effectively leverages unlabeled data for improved action recognition.

Abstract

Recognizing actions from a limited set of labeled videos remains a challenge as annotating visual data is not only tedious but also can be expensive due to classified nature. Moreover, handling spatio-temporal data using deep $3$ D transformers for this can introduce significant computational complexity. In this paper, our objective is to address video action recognition in a semi-supervised setting by leveraging only a handful of labeled videos along with a collection of unlabeled videos in a compute efficient manner. Specifically, we rearrange multiple frames from the input videos in row-column form to construct super images. Subsequently, we capitalize on the vast pool of unlabeled samples and employ contrastive learning on the encoded super images. Our proposed approach employs two pathways to generate representations for temporally augmented super images originating from the same…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Anomaly Detection Techniques and Applications

MethodsSparse Evolutionary Training · Contrastive Learning