Eliciting Textual Descriptions from Representations of Continuous   Prompts

Dana Ramati; Daniela Gottesman; Mor Geva

arXiv:2410.11660·cs.CL·October 16, 2024

Eliciting Textual Descriptions from Representations of Continuous Prompts

Dana Ramati, Daniela Gottesman, Mor Geva

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces InSPEcT, a novel method for interpreting continuous prompts by eliciting textual descriptions from their representations, improving understanding and debugging of prompt biases in large language models.

Contribution

The work proposes a new interpretability approach for continuous prompts that generates textual descriptions and reveals biases, surpassing previous token projection methods.

Findings

01

InSPEcT produces accurate task descriptions correlating with performance.

02

The method uncovers biased features linked to model predictions.

03

Descriptions become more faithful as task performance improves.

Abstract

Continuous prompts, or "soft prompts", are a widely-adopted parameter-efficient tuning strategy for large language models, but are often less favorable due to their opaque nature. Prior attempts to interpret continuous prompts relied on projecting individual prompt tokens onto the vocabulary space. However, this approach is problematic as performant prompts can yield arbitrary or contradictory text, and it interprets prompt tokens individually. In this work, we propose a new approach to interpret continuous prompts that elicits textual descriptions from their representations during model inference. Using a Patchscopes variant (Ghandeharioun et al., 2024) called InSPEcT over various tasks, we show our method often yields accurate task descriptions which become more faithful as task performance increases. Moreover, an elaborated version of InSPEcT reveals biased features in continuous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

danaramati1/InSPEcT
pytorchOfficial

Videos

Eliciting Textual Descriptions from Representations of Continuous Prompts· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques