Learning to Steer: Input-dependent Steering for Multimodal LLMs
Jayneel Parekh, Pegah Khayatan, Mustafa Shukor, Arnaud Dapogny, Alasdair Newson, Matthieu Cord

TL;DR
This paper introduces L2S, a method for input-dependent steering of multimodal LLMs, which uses a learned auxiliary module to generate input-specific guidance, reducing hallucinations and improving safety.
Contribution
It proposes a novel input-specific steering technique for multimodal LLMs using a learned auxiliary module, addressing limitations of static steering methods.
Findings
Reduces hallucinations in MLLMs
Enforces safety in model responses
Outperforms static baseline steering methods
Abstract
Steering has emerged as a practical approach to enable post-hoc guidance of LLMs towards enforcing a specific behavior. However, it remains largely underexplored for multimodal LLMs (MLLMs); furthermore, existing steering techniques, such as mean steering, rely on a single steering vector, applied independently of the input query. This paradigm faces limitations when the desired behavior is dependent on the example at hand. For example, a safe answer may consist in abstaining from answering when asked for an illegal activity, or may point to external resources or consultation with an expert when asked about medical advice. In this paper, we investigate a fine-grained steering that uses an input-specific linear shift. This shift is computed using contrastive input-specific prompting. However, the input-specific prompts required for this approach are not known at test time. Therefore, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Multi-Agent Systems and Negotiation
