Comparing Bottom-Up and Top-Down Steering Approaches on In-Context   Learning Tasks

Madeline Brumley; Joe Kwon; David Krueger; Dmitrii Krasheninnikov,; Usman Anwar

arXiv:2411.07213·cs.LG·November 12, 2024

Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks

Madeline Brumley, Joe Kwon, David Krueger, Dmitrii Krasheninnikov,, Usman Anwar

PDF

Open Access

TL;DR

This paper compares bottom-up and top-down interpretability methods for large language models, revealing their strengths and limitations across different in-context learning tasks.

Contribution

It provides a quantitative comparison of vector steering methods, highlighting their task-specific effectiveness and guiding future interpretability research.

Findings

01

ICV outperforms FV in behavioral shifting tasks

02

FV excels in tasks requiring precision

03

Both methods are effective only on specific task types

Abstract

A key objective of interpretability research on large language models (LLMs) is to develop methods for robustly steering models toward desired behaviors. To this end, two distinct approaches to interpretability -- ``bottom-up" and ``top-down" -- have been presented, but there has been little quantitative comparison between them. We present a case study comparing the effectiveness of representative vector steering methods from each branch: function vectors (FV; arXiv:2310.15213), as a bottom-up method, and in-context vectors (ICV; arXiv:2311.06668) as a top-down method. While both aim to capture compact representations of broad in-context learning tasks, we find they are effective only on specific types of tasks: ICVs outperform FVs in behavioral shifting, whereas FVs excel in tasks requiring more precision. We discuss the implications for future evaluations of steering methods and for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman-Automation Interaction and Safety