Breaking Free Transformer Models: Task-specific Context Attribution   Promises Improved Generalizability Without Fine-tuning Pre-trained LLMs

Stepan Tytarenko; Mohammad Ruhul Amin

arXiv:2401.16638·cs.CL·January 31, 2024·2 cites

Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMs

Stepan Tytarenko, Mohammad Ruhul Amin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a task-specific context attribution framework for transformer models that maintains and enhances generalizability without fine-tuning, leading to improved classification performance across multiple NLP datasets.

Contribution

The paper proposes a novel context attribution method using a concept operator and loss functions, improving transformer model performance without fine-tuning.

Findings

01

8% accuracy improvement on HateXplain with non-fine-tuned BERT

02

Outperforms fine-tuned XLNet by 1% on IMDB

03

Increases F1-score by 7% in cross-dataset tests

Abstract

Fine-tuning large pre-trained language models (LLMs) on particular datasets is a commonly employed strategy in Natural Language Processing (NLP) classification tasks. However, this approach usually results in a loss of models generalizability. In this paper, we present a framework that allows for maintaining generalizability, and enhances the performance on the downstream task by utilizing task-specific context attribution. We show that a linear transformation of the text representation from any transformer model using the task-specific concept operator results in a projection onto the latent concept space, referred to as context attribution in this paper. The specific concept operator is optimized during the supervised learning stage via novel loss functions. The proposed framework demonstrates that context attribution of the text representation for each task objective can improve the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stepantita/space-model
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning and Algorithms · Explainable Artificial Intelligence (XAI)

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Attention Dropout · Linear Layer · WordPiece · Weight Decay · BERT · Dropout · SentencePiece