Semantic-guided Fine-tuning of Foundation Model for Long-tailed Visual Recognition

Yufei Peng; Yonggang Zhang; Yiu-ming Cheung

arXiv:2507.12807·cs.CV·July 18, 2025

Semantic-guided Fine-tuning of Foundation Model for Long-tailed Visual Recognition

Yufei Peng, Yonggang Zhang, Yiu-ming Cheung

PDF

Open Access

TL;DR

This paper introduces Sage, a semantic-guided fine-tuning method for foundation models that improves long-tailed visual recognition by aligning visual and textual modalities and addressing distribution mismatch bias.

Contribution

The paper proposes a novel SG-Adapter and a distribution mismatch-aware compensation factor to enhance semantic alignment and rectify bias in long-tailed visual recognition.

Findings

01

Sage significantly improves performance on long-tailed datasets.

02

Semantic guidance enhances visual-textual alignment.

03

The compensation factor effectively reduces prediction bias.

Abstract

The variance in class-wise sample sizes within long-tailed scenarios often results in degraded performance in less frequent classes. Fortunately, foundation models, pre-trained on vast open-world datasets, demonstrate strong potential for this task due to their generalizable representation, which promotes the development of adaptive strategies on pre-trained models in long-tailed learning. Advanced fine-tuning methods typically adjust visual encoders while neglecting the semantics derived from the frozen text encoder, overlooking the visual and textual alignment. To strengthen this alignment, we propose a novel approach, Semantic-guided fine-tuning of foundation model for long-tailed visual recognition (Sage), which incorporates semantic guidance derived from textual modality into the visual fine-tuning process. Specifically, we introduce an SG-Adapter that integrates class descriptions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Image Processing and 3D Reconstruction · Advanced Image and Video Retrieval Techniques