Cross-Architectural Positive Pairs improve the effectiveness of Self-Supervised Learning
Pranav Singh, Jacopo Cirrone

TL;DR
This paper introduces CASS, a novel self-supervised learning method combining Transformer and CNN architectures, which improves performance, robustness, and efficiency across datasets compared to existing methods.
Contribution
CASS is a new self-supervised learning approach that leverages cross-architectural positive pairs, enhancing robustness and reducing computational costs.
Findings
CASS improves accuracy by up to 10.13% with full labels.
CASS reduces training time by 69%.
CASS is more robust to batch size and epoch variations.
Abstract
Existing self-supervised techniques have extreme computational requirements and suffer a substantial drop in performance with a reduction in batch size or pretraining epochs. This paper presents Cross Architectural - Self Supervision (CASS), a novel self-supervised learning approach that leverages Transformer and CNN simultaneously. Compared to the existing state-of-the-art self-supervised learning approaches, we empirically show that CASS-trained CNNs and Transformers across four diverse datasets gained an average of 3.8% with 1% labeled data, 5.9% with 10% labeled data, and 10.13% with 100% labeled data while taking 69% less time. We also show that CASS is much more robust to changes in batch size and training epochs than existing state-of-the-art self-supervised learning approaches. We have open-sourced our code at https://github.com/pranavsinghps1/CASS.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Human Pose and Action Recognition · Machine Learning and Data Classification
MethodsAttention Is All You Need · Linear Layer · Softmax · Absolute Position Encodings · Byte Pair Encoding · Adam · Layer Normalization · Label Smoothing · Multi-Head Attention · Dense Connections
