TL;DR
This paper reveals that large language models encode a general, interpretable filtering mechanism similar to functional programming, which can be extracted, reused, and understood through causal analysis.
Contribution
It uncovers a causal, interpretable filter head mechanism in LLMs that generalizes across tasks, formats, and languages, advancing understanding of model internals.
Findings
Filter heads encode a portable filtering predicate.
Models can reapply filtering predicates across different contexts.
Alternative strategies like eager evaluation are also identified.
Abstract
We investigate the mechanisms underlying a range of list-processing tasks in LLMs, and we find that LLMs have learned to encode a compact, causal representation of a general filtering operation that mirrors the generic "filter" function of functional programming. Using causal mediation analysis on a diverse set of list-processing tasks, we find that a small number of attention heads, which we dub filter heads, encode a compact representation of the filtering predicate in their query states at certain tokens. We demonstrate that this predicate representation is general and portable: it can be extracted and reapplied to execute the same filtering operation on different collections, presented in different formats, languages, or even in tasks. However, we also identify situations where transformer LMs can exploit a different strategy for filtering: eagerly evaluating if an item satisfies…
Peer Reviews
Decision·ICLR 2026 Poster
* The paper is very well written, and has some really clean experiments * The experiments cover their bases quite well (good ablations, good generalization experiments) * I think the results from Section 5 are particularly insightful and also just really cool. * I think there's still a lot of value in doing this kind of mech interp :)
* I would like to see more tasks, especially given the CheckPresence results; it seems like this is a really nice explanation but I'm worried it won't actually generalize / you maybe got lucky with the tasks you chose. * Similarly, I don't know what's going on with cross task transfer for SelectFirst/SelectLast. I think this deserves more time.
- The studied task (selecting an item from a list based on a given predicate) is important and relevant in the current state of LLMs interpretability. - The comparison of placing the predicate before and after the options is insightful and sheds light on the ways that causal masking affects the learned mechanisms in LLMs. - The mechamism is explained clearly. The paper is easy to understand and figures are good. - Experiments are comprehensive. - The training-free probe is an interesting and imp
1. The set of heads identified is quite large and the precise role of each head is unclear 2. The identified heads have low portability to aggregation tasks (presence checking, counting) which suggests a limited relevance.
- Detailed study: ablations, transfer to other examples, and generalization across tasks/presentation types are all studied - Based on causal analysis - Mostly clear writing
- There are some unclear parts, see questions.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
