No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations
Walter Simoncini, Spyros Gidaris, Andrei Bursuc, Yuki M. Asano

TL;DR
This paper presents FUNGI, a simple self-supervised gradient method that enhances pretrained transformer features across modalities, improving various downstream tasks without additional training.
Contribution
The paper introduces FUNGI, a novel approach that leverages self-supervised gradients to improve frozen transformer representations across vision, language, and audio tasks.
Findings
Consistent performance improvements across multiple datasets and modalities.
Enhanced capabilities in classification, clustering, and retrieval tasks.
Significant boost in in-context scene understanding, e.g., +17% for semantic segmentation.
Abstract
This paper introduces FUNGI, Features from UNsupervised GradIents, a method to enhance the features of transformer encoders by leveraging self-supervised gradients. Our method is simple: given any pretrained model, we first compute gradients from various self-supervised objectives for each input. These gradients are projected to a lower dimension and then concatenated with the model's output embedding. The resulting features are evaluated on k-nearest neighbor classification over 11 datasets from vision, 5 from natural language processing, and 2 from audio. Across backbones spanning various sizes and pretraining strategies, FUNGI features provide consistent performance improvements over the embeddings. We also show that using FUNGI features can benefit linear classification, clustering and image retrieval, and that they significantly improve the retrieval-based in-context scene…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsUnderwater Acoustics Research · Image Enhancement Techniques · Advanced Image and Video Retrieval Techniques
MethodsAttention Is All You Need · Softmax · Residual Connection · Layer Normalization · Linear Layer · Multi-Head Attention · Dense Connections · Vision Transformer · self-DIstillation with NO labels
