From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision-Language Models

Qidong Wang; Junjie Hu; and Ming Jiang

arXiv:2604.17941·cs.CV·April 21, 2026

From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision-Language Models

Qidong Wang, Junjie Hu, and Ming Jiang

PDF

1 Repo

TL;DR

HONES is a gradient-free framework that identifies and modulates task-critical neurons in multi-task vision-language models, improving interpretability and performance across diverse tasks.

Contribution

It introduces a novel task-aware neuron attribution method that considers attention head conditioning and enables effective neuron steering in multi-task VLMs.

Findings

01

HONES outperforms existing methods in identifying task-critical neurons.

02

Steering neurons with HONES improves model performance on multiple tasks.

03

The framework is validated on four diverse multimodal tasks and two VLMs.

Abstract

Recent work has increasingly explored neuron-level interpretation in vision-language models (VLMs) to identify neurons critical to final predictions. However, existing neuron analyses generally focus on single tasks, limiting the comparability of neuron importance across tasks. Moreover, ranking strategies tend to score neurons in isolation, overlooking how task-dependent information pathways shape the write-in effects of feed-forward network (FFN) neurons. This oversight can exacerbate neuron polysemanticity in multi-task settings, introducing noise into the identification and intervention of task-critical neurons. In this study, we propose HONES (Head-Oriented Neuron Explanation & Steering), a gradient-free framework for task-aware neuron attribution and steering in multi-task VLMs. HONES ranks FFN neurons by their causal write-in contributions conditioned on task-relevant attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

petergit1/HONES
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.