
TL;DR
This paper investigates conflict adaptation in vision-language models using a Stroop task, revealing that most models exhibit human-like conflict adaptation behavior and identifying neural-like representations underlying this process.
Contribution
It demonstrates that vision-language models show conflict adaptation similar to humans and uncovers neural-like supernodes responsible for this behavior.
Findings
Most models exhibit conflict adaptation behavior.
Overlapping supernodes for text and color are identified.
A conflict-modulated supernode impacts Stroop errors when ablated.
Abstract
A signature of human cognitive control is conflict adaptation: improved performance on a high-conflict trial following another high-conflict trial. This phenomenon offers an account for how cognitive control, a scarce resource, is recruited. Using a sequential Stroop task, we find that 12 of 13 vision-language models (VLMs) tested exhibit behavior consistent with conflict adaptation, with the lone exception likely reflecting a ceiling effect. To understand the representational basis of this behavior, we use sparse autoencoders (SAEs) to identify task-relevant supernodes in InternVL 3.5 4B. Partially overlapping supernodes emerge for text and color in both early and late layers, and their relative sizes mirror the automaticity asymmetry between reading and color naming in humans. We further isolate a conflict-modulated supernode in layers 24-25 whose ablation significantly increases…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
