Eliciting Textual Descriptions from Representations of Continuous Prompts
Dana Ramati, Daniela Gottesman, Mor Geva

TL;DR
This paper introduces InSPEcT, a novel method for interpreting continuous prompts by eliciting textual descriptions from their representations, improving understanding and debugging of prompt biases in large language models.
Contribution
The work proposes a new interpretability approach for continuous prompts that generates textual descriptions and reveals biases, surpassing previous token projection methods.
Findings
InSPEcT produces accurate task descriptions correlating with performance.
The method uncovers biased features linked to model predictions.
Descriptions become more faithful as task performance improves.
Abstract
Continuous prompts, or "soft prompts", are a widely-adopted parameter-efficient tuning strategy for large language models, but are often less favorable due to their opaque nature. Prior attempts to interpret continuous prompts relied on projecting individual prompt tokens onto the vocabulary space. However, this approach is problematic as performant prompts can yield arbitrary or contradictory text, and it interprets prompt tokens individually. In this work, we propose a new approach to interpret continuous prompts that elicits textual descriptions from their representations during model inference. Using a Patchscopes variant (Ghandeharioun et al., 2024) called InSPEcT over various tasks, we show our method often yields accurate task descriptions which become more faithful as task performance increases. Moreover, an elaborated version of InSPEcT reveals biased features in continuous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques
