OBoW: Online Bag-of-Visual-Words Generation for Self-Supervised Learning
Spyros Gidaris, Andrei Bursuc, Gilles Puy, Nikos Komodakis, Matthieu, Cord, Patrick P\'erez

TL;DR
This paper introduces OBoW, an online, self-supervised learning method that trains a convolutional network to reconstruct a bag-of-visual-words representation, outperforming previous contrastive and supervised methods in various image understanding tasks.
Contribution
It proposes a novel online teacher-student framework for reconstructing BoW representations, enabling effective self-supervised learning of visual features.
Findings
Outperforms previous state-of-the-art unsupervised methods in object detection and classification.
Achieves results better than supervised pre-training in several benchmarks.
Demonstrates the effectiveness of BoW-based self-supervised learning in diverse applications.
Abstract
Learning image representations without human supervision is an important and active research field. Several recent approaches have successfully leveraged the idea of making such a representation invariant under different types of perturbations, especially via contrastive-based instance discrimination training. Although effective visual representations should indeed exhibit such invariances, there are other important characteristics, such as encoding contextual reasoning skills, for which alternative reconstruction-based approaches might be better suited. With this in mind, we propose a teacher-student scheme to learn representations by training a convolutional net to reconstruct a bag-of-visual-words (BoW) representation of an image, given as input a perturbed version of that same image. Our strategy performs an online training of both the teacher network (whose role is to generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
MethodsAverage Pooling · Global Average Pooling · 1x1 Convolution · Bottleneck Residual Block · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Residual Block · Residual Connection · Convolution
