TAIA: Large Language Models are Out-of-Distribution Data Learners

Shuyang Jiang; Yusheng Liao; Ya Zhang; Yanfeng Wang; Yu Wang

arXiv:2405.20192·cs.CL·October 18, 2024

TAIA: Large Language Models are Out-of-Distribution Data Learners

Shuyang Jiang, Yusheng Liao, Ya Zhang, Yanfeng Wang, Yu Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel inference-time method, TAIA, that improves large language models' performance in data-scarce, domain-mismatched scenarios by selectively fine-tuning attention parameters and inferring with only those parameters.

Contribution

The paper reveals that only attention parameters benefit from fine-tuning in mismatched domains and proposes TAIA, a method that enhances LLM performance by training all parameters but inferring with only attention.

Findings

01

TAIA outperforms fully fine-tuned and base models across multiple tasks.

02

Selective attention parameter tuning improves robustness to data mismatch.

03

TAIA enhances task-specific performance and resists jailbreaking tuning.

Abstract

Fine-tuning on task-specific question-answer pairs is a predominant method for enhancing the performance of instruction-tuned large language models (LLMs) on downstream tasks. However, in certain specialized domains, such as healthcare or harmless content generation, it is nearly impossible to obtain a large volume of high-quality data that matches the downstream distribution. To improve the performance of LLMs in data-scarce domains with domain-mismatched data, we re-evaluated the Transformer architecture and discovered that not all parameter updates during fine-tuning contribute positively to downstream performance. Our analysis reveals that within the self-attention and feed-forward networks, only the fine-tuned attention parameters are particularly beneficial when the training set's distribution does not fully align with the test set. Based on this insight, we propose an effective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pixas/TAIA_LLM
pytorchOfficial

Videos

TAIA: Large Language Models are Out-of-Distribution Data Learners· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections