No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen   Representations

Walter Simoncini; Spyros Gidaris; Andrei Bursuc; Yuki M. Asano

arXiv:2407.10964·cs.CV·November 7, 2024

No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations

Walter Simoncini, Spyros Gidaris, Andrei Bursuc, Yuki M. Asano

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper presents FUNGI, a simple self-supervised gradient method that enhances pretrained transformer features across modalities, improving various downstream tasks without additional training.

Contribution

The paper introduces FUNGI, a novel approach that leverages self-supervised gradients to improve frozen transformer representations across vision, language, and audio tasks.

Findings

01

Consistent performance improvements across multiple datasets and modalities.

02

Enhanced capabilities in classification, clustering, and retrieval tasks.

03

Significant boost in in-context scene understanding, e.g., +17% for semantic segmentation.

Abstract

This paper introduces FUNGI, Features from UNsupervised GradIents, a method to enhance the features of transformer encoders by leveraging self-supervised gradients. Our method is simple: given any pretrained model, we first compute gradients from various self-supervised objectives for each input. These gradients are projected to a lower dimension and then concatenated with the model's output embedding. The resulting features are evaluated on k-nearest neighbor classification over 11 datasets from vision, 5 from natural language processing, and 2 from audio. Across backbones spanning various sizes and pretraining strategies, FUNGI features provide consistent performance improvements over the embeddings. We also show that using FUNGI features can benefit linear classification, clustering and image retrieval, and that they significantly improve the retrieval-based in-context scene…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

waltersimoncini/fungivision
pytorchOfficial

Videos

No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations· slideslive

Taxonomy

TopicsUnderwater Acoustics Research · Image Enhancement Techniques · Advanced Image and Video Retrieval Techniques

MethodsAttention Is All You Need · Softmax · Residual Connection · Layer Normalization · Linear Layer · Multi-Head Attention · Dense Connections · Vision Transformer · self-DIstillation with NO labels