Shared Lexical Task Representations Explain Behavioral Variability In LLMs
Zhuonan Yang, Jacob Xiaochen Li, Francisco Piedrahita Velez, Eric Todd, David Bau, Michael L. Littman, Stephen H. Bach, Ellie Pavlick

TL;DR
This paper reveals that shared lexical task representations in LLMs, specifically task-specific attention heads, explain behavioral variability across different prompts and prompting styles.
Contribution
It identifies task-specific attention heads that are shared across prompting styles and explain prompt sensitivity in LLMs.
Findings
Shared lexical task heads are common across prompting styles.
Behavioral variability correlates with activation levels of these heads.
Failures often involve competing task representations diluting the target signal.
Abstract
One of the most common complaints about large language models (LLMs) is their prompt sensitivity -- that is, the fact that their ability to perform a task or provide a correct answer to a question can depend unpredictably on the way the question is posed. We investigate this variation by comparing two very different but commonly-used styles of prompting: instruction-based prompts, which describe the task in natural language, and example-based prompts, which provide in-context few-shot demonstration pairs to illustrate the task. We find that, despite large variation in performance as a function of the prompt, the model engages some common underlying mechanisms across different prompts of a task. Specifically, we identify task-specific attention heads whose outputs literally describe the task -- which we dub lexical task heads -- and show that these heads are shared across prompting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
