Word Importance Explains How Prompts Affect Language Model Outputs
Stefan Hackmann, Haniyeh Mahmoudian, Mark Steadman, Michael Schmidt

TL;DR
This paper introduces a method to explain how individual words in prompts influence large language model outputs by measuring their statistical impact, enhancing transparency and interpretability of LLMs.
Contribution
The study proposes a novel word importance measure based on permutation importance, applicable even without attention weights, to analyze prompt effects on LLM outputs.
Findings
Word importance scores correlate with suffix importance across models.
The method works with various scoring functions.
It improves understanding of prompt influence on LLM behavior.
Abstract
The emergence of large language models (LLMs) has revolutionized numerous applications across industries. However, their "black box" nature often hinders the understanding of how they make specific decisions, raising concerns about their transparency, reliability, and ethical use. This study presents a method to improve the explainability of LLMs by varying individual words in prompts to uncover their statistical impact on the model outputs. This approach, inspired by permutation importance for tabular data, masks each word in the system prompt and evaluates its effect on the outputs based on the available text scores aggregated over multiple user inputs. Unlike classical attention, word importance measures the impact of prompt words on arbitrarily-defined text scores, which enables decomposing the importance of words into the specific measures of interest--including bias, reading…
Peer Reviews
Decision·Submitted to ICLR 2024
1. This paper presents a method to masks each word in the system prompt and evaluates its effect on the outputs based on the available text scores aggregated over multiple user inputs.
1. The contribution of the paper is limited, similar topics have been investigated before while this paper didn’t pose any more valuable conclusions. 2. The experiment section is terribly organized. No quantitative results are provided. The experiment design is very confusing and too specific. 3. The presentation is really bad a. All the figures are poorly illustrated. There is even an untitled algorithm diagram before Section 4. b. All the tables are also hasty and careless.
* The paper utilizes a common technique in NLP (word saliencies) and applies the concept of word importances to a recent LLM. Doing so can lead to informative insights into model interpretability as pointed out in the paper.
* The dataset used for the experiment has been generated with an LLM. This is problematic since the dataset is biased towards generations from another LLM and does not necessarily reflect a distribution of human inputs. As such, the reported results do not necessarily hold true for human inputs. It would therefore be important to conduct experiments on a human-written dataset as well. * The paper focuses substantially on an importance comparison between individual words and an instruction suffix
The paper explores and interesting concept of word importance which I do find essential in further understanding how large language models like ChatGPT works. The proposed method has some potential provided that it carefully addresses some of the very obvious limitations discussed below and further improve its algorithmic features to consider scale, flexibility, and efficiency.
The depth of the experiments conducted in the study is extremely limited as only three metrics which cover Flesch Ease, word count, and topic similarity (cosine embedding) have been explored. The model variation is also very limited, with only one model used for experimentation, GPT-3.5-Turbo (ChatGPT), despite the diverse publicly available models in Hugginface such as Llama, FlanT5, BLOOMZ. This implies that the study essentially optimizes for OpenAI products instead of prioritizing diverse re
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
