Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers

Haowen Pan; Yixin Cao; Xiaozhi Wang; Xun Yang; Meng Wang

arXiv:2311.07470·cs.CL·June 12, 2024·1 cites

Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers

Haowen Pan, Yixin Cao, Xiaozhi Wang, Xun Yang, Meng Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a new efficient method to identify and edit neurons in multi-modal transformers, improving interpretability and reducing hallucinations by leveraging key neuron properties.

Contribution

It proposes a novel neuron identification technique that avoids costly gradient computations and a knowledge editing approach to mitigate hallucinations in multi-modal LLMs.

Findings

01

Validated effectiveness through extensive experiments

02

Identified key properties of multi-modal neurons: sensitivity, specificity, causal-effect

03

Enhanced interpretability and hallucination mitigation in multi-modal models

Abstract

Understanding the internal mechanisms by which multi-modal large language models (LLMs) interpret different modalities and integrate cross-modal representations is becoming increasingly critical for continuous improvements in both academia and industry. In this paper, we propose a novel method to identify key neurons for interpretability -- how multi-modal LLMs bridge visual and textual concepts for captioning. Our method improves conventional works upon efficiency and applied range by removing needs of costly gradient computation. Based on those identified neurons, we further design a multi-modal knowledge editing method, beneficial to mitigate sensitive words or hallucination. For rationale of our design, we provide theoretical assumption. For empirical evaluation, we have conducted extensive quantitative and qualitative experiments. The results not only validate the effectiveness of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

opanhw/MM_Neurons
pytorchOfficial

Videos

Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning