Joint Semantic Token Selection and Prompt Optimization for Interpretable Prompt Learning

Yating Wang; Yaqi Zhao; Yongshun Gong; Yilong Yin; Haoliang Sun

arXiv:2605.04425·cs.CV·May 7, 2026

Joint Semantic Token Selection and Prompt Optimization for Interpretable Prompt Learning

Yating Wang, Yaqi Zhao, Yongshun Gong, Yilong Yin, Haoliang Sun

PDF

TL;DR

This paper introduces IPL, a hybrid framework for prompt learning that combines discrete semantic token selection with continuous prompt optimization, enhancing interpretability and accuracy in vision-language models.

Contribution

It proposes a novel alternating optimization approach for semantic token selection and prompt tuning, improving interpretability without sacrificing adaptability.

Findings

01

IPL improves interpretability and accuracy across multiple benchmarks.

02

The framework is plug-and-play and compatible with existing prompt learning methods.

03

Extensive experiments validate the effectiveness and scalability of IPL.

Abstract

Vision-language models such as CLIP achieve strong visual-textual alignment, but often suffer from overfitting and limited interpretability when adapted through continuous prompt learning. While discrete prompt optimization improves interpretability, it usually depends on large external models, leading to high computational costs and limited scalability. In this paper, we propose Interpretable Prompt Learning (IPL), a hybrid framework that alternates between discrete semantic token selection and continuous prompt optimization. Specifically, IPL formulates semantic token selection as an approximate submodular optimization problem, encouraging tokens that are both human-understandable and semantically diverse. It further adopts an alternating optimization strategy to integrate discrete token selection with continuous prompt tuning, improving interpretability while preserving adaptability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.