Supervised Contrastive Frame Aggregation for Video Representation Learning
Shaif Chowdhury, Mushfika Rahman, Greg Hamerly

TL;DR
This paper introduces a supervised contrastive learning framework for video representation that uses a novel frame aggregation strategy to leverage global context, improve accuracy, and reduce computational costs.
Contribution
It presents a new video-to-image aggregation method combined with contrastive learning, enabling effective video representations with pre-trained CNNs and less computational overhead.
Findings
Outperforms existing methods in classification accuracy
Requires fewer computational resources
Achieves 76% accuracy on Penn Action and 48% on HMDB51
Abstract
We propose a supervised contrastive learning framework for video representation learning that leverages temporally global context. We introduce a video to image aggregation strategy that spatially arranges multiple frames from each video into a single input image. This design enables the use of pre trained convolutional neural network backbones such as ResNet50 and avoids the computational overhead of complex video transformer models. We then design a contrastive learning objective that directly compares pairwise projections generated by the model. Positive pairs are defined as projections from videos sharing the same label while all other projections are treated as negatives. Multiple natural views of the same video are created using different temporal frame samplings from the same underlying video. Rather than relying on data augmentation these frame level variations produce diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
