Self-supervision through Random Segments with Autoregressive Coding   (RandSAC)

Tianyu Hua; Yonglong Tian; Sucheng Ren; Michalis Raptis; Hang Zhao,; Leonid Sigal

arXiv:2203.12054·cs.CV·October 27, 2022·5 cites

Self-supervision through Random Segments with Autoregressive Coding (RandSAC)

Tianyu Hua, Yonglong Tian, Sucheng Ren, Michalis Raptis, Hang Zhao,, Leonid Sigal

PDF

Open Access 1 Video

TL;DR

This paper introduces RandSAC, a novel self-supervised learning method for visual features that combines parallel and sequential predictions of image segments, inspired by NLP models, improving performance on multiple datasets.

Contribution

The paper proposes RandSAC, a new self-supervised training strategy for vision transformers that uses hierarchical segment grouping and combined autoregressive and parallel prediction mechanisms.

Findings

01

RandSAC improves feature learning performance on CIFAR and ImageNet datasets.

02

Randomized segment serialization enhances training effectiveness.

03

Skip-connections in the decoder further boost accuracy.

Abstract

Inspired by the success of self-supervised autoregressive representation learning in natural language (GPT and its variants), and advances in recent visual architecture design with Vision Transformers (ViTs), in this paper, we explore the effect various design choices have on the success of applying such training strategies for visual feature learning. Specifically, we introduce a novel strategy that we call Random Segments with Autoregressive Coding (RandSAC). In RandSAC, we group patch representations (image tokens) into hierarchically arranged segments; within each segment, tokens are predicted in parallel, similar to BERT, while across segment predictions are sequential, similar to GPT. We illustrate that randomized serialization of the segments significantly improves the performance and results in distribution over spatially-long (across-segments) and -short (within-segment)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Self-supervision through Random Segments with Autoregressive Coding (RandSAC)· slideslive

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Linear Warmup With Cosine Annealing · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing · Discriminative Fine-Tuning · Dropout