Semi-Supervised Fine-Tuning of Vision Foundation Models with Content-Style Decomposition
Mariia Drozdova, Vitaliy Kinakh, Yury Belousov, Erica Lastufka, Slava, Voloshynovskiy

TL;DR
This paper introduces a semi-supervised fine-tuning method for vision foundation models that uses content-style decomposition to improve performance on tasks with limited labeled data, addressing distribution shift issues.
Contribution
It proposes a novel semi-supervised fine-tuning approach leveraging content-style decomposition within an information-theoretic framework for vision models.
Findings
Improves performance in low-labeled data regimes
Enhances latent representations of pre-trained models
Effective across multiple datasets and backbone configurations
Abstract
In this paper, we present a semi-supervised fine-tuning approach designed to improve the performance of pre-trained foundation models on downstream tasks with limited labeled data. By leveraging content-style decomposition within an information-theoretic framework, our method enhances the latent representations of pre-trained vision foundation models, aligning them more effectively with specific task objectives and addressing the problem of distribution shift. We evaluate our approach on multiple datasets, including MNIST, its augmented variations (with yellow and white stripes), CIFAR-10, SVHN, and GalaxyMNIST. The experiments show improvements over supervised finetuning baseline of pre-trained models, particularly in low-labeled data regimes, across both frozen and trainable backbones for the majority of the tested datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Computational Techniques and Applications · Advanced Image and Video Retrieval Techniques · Neural Networks and Applications
