Fine-tuned network relies on generic representation to solve unseen   cognitive task

Dongyan Lin

arXiv:2406.18926·cs.LG·June 28, 2024

Fine-tuned network relies on generic representation to solve unseen cognitive task

Dongyan Lin

PDF

Open Access

TL;DR

This study investigates whether fine-tuned GPT-2 models rely more on their generic pretrained representations or develop new task-specific mechanisms when learning a novel decision-making task, revealing the importance of pretraining in generalization.

Contribution

The paper compares fine-tuned and from-scratch GPT-2 models on a novel task, showing fine-tuning relies heavily on pretrained features, especially in later layers.

Findings

01

Fine-tuned models depend on pretrained representations in later layers.

02

Models trained from scratch develop different, task-specific mechanisms.

03

Pretraining enhances generalization but limits task-specific adaptation.

Abstract

Fine-tuning pretrained language models has shown promising results on a wide range of tasks, but when encountering a novel task, do they rely more on generic pretrained representation, or develop brand new task-specific solutions? Here, we fine-tuned GPT-2 on a context-dependent decision-making task, novel to the model but adapted from neuroscience literature. We compared its performance and internal mechanisms to a version of GPT-2 trained from scratch on the same task. Our results show that fine-tuned models depend heavily on pretrained representations, particularly in later layers, while models trained from scratch develop different, more task-specific mechanisms. These findings highlight the advantages and limitations of pretraining for task generalization and underscore the need for further investigation into the mechanisms underpinning task-specific fine-tuning in LLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · EEG and Brain-Computer Interfaces

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Residual Connection · Discriminative Fine-Tuning · Weight Decay · Softmax · Layer Normalization · Byte Pair Encoding · Attention Dropout