Self-supervised Learning of Contextualized Local Visual Embeddings

Thalles Santos Silva; Helio Pedrini; Ad\'in Ram\'irez Rivera

arXiv:2310.00527·cs.CV·October 5, 2023

Self-supervised Learning of Contextualized Local Visual Embeddings

Thalles Santos Silva, Helio Pedrini, Ad\'in Ram\'irez Rivera

PDF

Open Access 1 Repo

TL;DR

This paper introduces CLoVE, a self-supervised method that learns contextualized local visual embeddings using a novel attention mechanism, achieving state-of-the-art results in dense prediction tasks.

Contribution

The paper proposes a new self-supervised learning approach with a normalized multi-head self-attention layer for dense visual representations, outperforming existing methods.

Findings

01

State-of-the-art performance in object detection

02

Superior results in instance segmentation

03

Effective in keypoint detection and dense pose estimation

Abstract

We present Contextualized Local Visual Embeddings (CLoVE), a self-supervised convolutional-based method that learns representations suited for dense prediction tasks. CLoVE deviates from current methods and optimizes a single loss function that operates at the level of contextualized local embeddings learned from output feature maps of convolution neural network (CNN) encoders. To learn contextualized embeddings, CLoVE proposes a normalized mult-head self-attention layer that combines local features from different parts of an image based on similarity. We extensively benchmark CLoVE's pre-trained representations on multiple datasets. CLoVE reaches state-of-the-art performance for CNN-based architectures in 4 dense prediction downstream tasks, including object detection, instance segmentation, keypoint detection, and dense pose estimation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sthalles/clove
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Human Pose and Action Recognition · Digital Imaging for Blood Diseases

MethodsConvolution