Self-Supervised Visual Representation Learning Using Lightweight Architectures
Prathamesh Sonawane, Sparsh Drolia, Saqib Shamsi, Bhargav Jain

TL;DR
This paper evaluates self-supervised learning methods on lightweight architectures, analyzing how model type, size, and pre-training influence feature quality and establishing benchmarks for resource-constrained networks.
Contribution
It provides a comprehensive comparison of self-supervised techniques on lightweight models and sets standards for future research in resource-efficient visual representation learning.
Findings
Performance varies with model size and architecture.
Pre-training duration impacts feature quality.
Established benchmarks for lightweight models.
Abstract
In self-supervised learning, a model is trained to solve a pretext task, using a data set whose annotations are created by a machine. The objective is to transfer the trained weights to perform a downstream task in the target domain. We critically examine the most notable pretext tasks to extract features from image data and further go on to conduct experiments on resource constrained networks, which aid faster experimentation and deployment. We study the performance of various self-supervised techniques keeping all other parameters uniform. We study the patterns that emerge by varying model type, size and amount of pre-training done for the backbone as well as establish a standard to compare against for future research. We also conduct comprehensive studies to understand the quality of representations learned by different architectures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition
