Towards Understanding Multi-Task Learning (Generalization) of LLMs via   Detecting and Exploring Task-Specific Neurons

Yongqi Leng; Deyi Xiong

arXiv:2407.06488·cs.CL·January 14, 2025

Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons

Yongqi Leng, Deyi Xiong

PDF

Open Access

TL;DR

This paper investigates how task-specific neurons in large language models influence multi-task learning and generalization, revealing neuron overlaps' importance and proposing a neuron-level fine-tuning method to improve continual learning.

Contribution

It introduces a method to detect task-specific neurons in LLMs and demonstrates their role in generalization and catastrophic forgetting, proposing a neuron-level fine-tuning approach.

Findings

01

Task-specific neurons are highly correlated with specific tasks.

02

Overlap of task-specific neurons relates to better generalization.

03

Neuron-level fine-tuning improves continual learning performance.

Abstract

While large language models (LLMs) have demonstrated superior multi-task capabilities, understanding the learning mechanisms behind this is still a challenging problem. In this paper, we attempt to understand such mechanisms from the perspective of neurons. Specifically, we detect task-sensitive neurons in LLMs via gradient attribution on task-specific data. Through extensive deactivation and fine-tuning experiments, we demonstrate that the detected neurons are highly correlated with the given task, which we term as task-specific neurons. With these identified task-specific neurons, we delve into two common problems in multi-task learning and continuous learning: Generalization and Catastrophic Forgetting. We find that the overlap of task-specific neurons is strongly associated with generalization and specialization across tasks. Interestingly, at certain layers of LLMs, there is a high…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications