SpecPL: Disentangling Spectral Granularity for Prompt Learning

Jingtao Zhou; Xirui Kang; Feiyang Huang; Lai-Man Po

arXiv:2605.04504·cs.CV·May 7, 2026

SpecPL: Disentangling Spectral Granularity for Prompt Learning

Jingtao Zhou, Xirui Kang, Feiyang Huang, Lai-Man Po

PDF

1 Repo

TL;DR

SpecPL introduces a spectral perspective to prompt learning for vision-language models, disentangling visual signals into semantic and granular components with counterfactual supervision to improve fine-grained discrimination.

Contribution

It proposes a novel spectral approach using a frozen VAE and counterfactual granule training to enhance prompt learning in vision-language models.

Findings

01

Achieves state-of-the-art performance on 11 benchmarks.

02

Reaches a new harmonic-mean accuracy of 81.51%.

03

Effectively bridges the stability-generalization gap in prompt learning.

Abstract

Existing prompt learning for VLMs exhibits a modality asymmetry, predominantly optimizing text tokens while still relying on frozen visual encoder as holistic extractor and neglecting the spectral granularity essential for fine-grained discrimination. To bridge this, we introduce Disentangling Spectral Granularity for Prompt Learning (SpecPL), which approaches prompt learning from a novel spectral perspective via Counterfactual Granule Supervision. Specifically, we leverage a frozen VAE to decompose visual signals into semantic low-frequency bands and granular high-frequency details. A frozen Visual Semantic Bank anchors text representations to universal low-frequency invariants, mitigating overfitting. Crucially, fine-grained discrimination is driven by counterfactual granule training: by permuting high-frequency signals, we compel the model to explicitly distinguish visual granularity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Mlrac1e/SpecPL-Prompt-Learning
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.