Noise-aware few-shot learning through bi-directional multi-view prompt alignment
Lu Niu, Cheng Xue

TL;DR
This paper introduces NA-MVP, a novel framework for noise-aware few-shot learning that employs bi-directional multi-view prompt alignment and optimal transport to improve robustness against noisy labels in vision-language models.
Contribution
NA-MVP advances few-shot learning by integrating region-aware alignment, bi-directional prompts, and selective refinement to effectively handle noisy supervision.
Findings
NA-MVP outperforms existing methods on synthetic and real-world noisy benchmarks.
The approach effectively distinguishes clean cues from noisy signals.
It improves the robustness of vision-language models in noisy few-shot scenarios.
Abstract
Vision-language models offer strong few-shot capability through prompt tuning but remain vulnerable to noisy labels, which can corrupt prompts and degrade cross-modal alignment. Existing approaches struggle because they often lack the ability to model fine-grained semantic cues and to adaptively separate clean from noisy signals. To address these challenges, we propose NA-MVP, a framework for Noise-Aware few-shot learning through bi-directional Multi-View Prompt alignment. NA-MVP is built upon a key conceptual shift: robust prompt learning requires moving from global matching to region-aware alignment that explicitly distinguishes clean cues from noisy ones. To realize this, NA-MVP employs (1) multi-view prompts combined with unbalanced optimal transport to achieve fine-grained patch-to-prompt correspondence while suppressing unreliable regions; (2) a bi-directional prompt design that…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
The method is clear with quality illustrations. The method shows consistent gains against the selected baselines. Leveraging prompt learning to tackle noisy labels is an interesting proposition that has yet not be extensively studied.
Several recent works tackle OT for prompt learning. FedOTP (CVPR24) proposes using unbalanced OT for global and local prompt cooperation, while PatchCT (ICCV23) introduces conditional transport for aligning visual and textual tokens at scale. Other recent work, such as GalLop (ECCV24) or LoCoOp (Neurips23), also proposes robust local/global mechanisms to enhance CLIP's robustness. Due to the proximity of these works to the proposed approach, a better positioning and a broader comparison with the
- The paper presents a well-motivated framework that addresses three challenges of noisy few-shot learning. - Comprehensive analytical experiments on multiple datasets support the proposed contributions.
- The proposed method does not yield consistent performance improvements over the baseline NLPrompt in table 1, especially when facing the asymmetric noises. - In table 2, as the number of shots increases, the performance gap between the proposed NA-MVP and the baseline NLPrompt becomes smaller. Does this suggest that the advantages of noise-aware alignment diminish when more samples are available? Not ensure whether the method could scale when data volume grows.
1. The works listed in the "Related Work" section are quite recent. 2. The compared SOTAs are quite new.
1. The overall logic falls somewhat short. The three sub-issues proposed in this work have weak connections to the key problem of noisy labels that needs to be addressed. 2. Moreover, there is no solid evidence demonstrating that the authors' method effectively resolves these three sub-issues. 3. The writing is poor and makes it hard to understand. The implementation details in the method section are not clearly explained. For the specific drawbacks, please refer to the "Questions" section bel
- **Strong empirical performance** The method demonstrates consistent improvements over relevant baselines across multiple noisy few-shot benchmarks, indicating practical effectiveness. - **Well-motivated focus on noisy few-shot adaptation** Addressing label noise in few-shot prompt tuning is a meaningful and realistic challenge, and the paper proposes a systematic pipeline that combines local alignment signals with selective label refinement. - **Integration of OT mechanisms into pr
- **Clarity of the writing** The clarity and organization of the manuscript could be improved. Some methodological components are difficult to follow, and several elements would benefit from more explicit definitions and explanations. For example, certain similarity terms are used before being formally introduced, and the Image-Text Bi-Directional Prompt loss is only described conceptually without a precise mathematical formulation. In addition, the description of some baselines and ablatio
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and Data Classification
