SITAR: Semi-supervised Image Transformer for Action Recognition
Owais Iqbal, Omprakash Chakraborty, Aftab Hussain, Rameswar, Panda, Abir Das

TL;DR
This paper introduces SITAR, a semi-supervised image transformer approach for action recognition that efficiently leverages limited labeled videos and unlabeled data using contrastive learning on super images, achieving high accuracy with reduced computation.
Contribution
SITAR is a novel semi-supervised method that rearranges video frames into super images and employs contrastive learning with a 2D transformer, reducing computational costs while improving performance.
Findings
Outperforms state-of-the-art semi-supervised methods on benchmark datasets.
Reduces computational complexity compared to 3D transformer models.
Effectively leverages unlabeled data for improved action recognition.
Abstract
Recognizing actions from a limited set of labeled videos remains a challenge as annotating visual data is not only tedious but also can be expensive due to classified nature. Moreover, handling spatio-temporal data using deep D transformers for this can introduce significant computational complexity. In this paper, our objective is to address video action recognition in a semi-supervised setting by leveraging only a handful of labeled videos along with a collection of unlabeled videos in a compute efficient manner. Specifically, we rearrange multiple frames from the input videos in row-column form to construct super images. Subsequently, we capitalize on the vast pool of unlabeled samples and employ contrastive learning on the encoded super images. Our proposed approach employs two pathways to generate representations for temporally augmented super images originating from the same…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Anomaly Detection Techniques and Applications
MethodsSparse Evolutionary Training · Contrastive Learning
