On the Transformation of Latent Space in Fine-Tuned NLP Models
Nadir Durrani, Hassan Sajjad, Fahim Dalvi, Firoj Alam

TL;DR
This paper investigates how the internal representations of NLP models change during fine-tuning, revealing that higher layers adapt towards task-specific concepts while lower layers retain general knowledge, with implications for adversarial attacks.
Contribution
It introduces an unsupervised hierarchical clustering method to analyze latent space transformations in fine-tuned NLP models, providing new insights into layer-wise concept evolution.
Findings
Higher layers evolve towards task-specific concepts
Lower layers retain generic pre-trained concepts
Higher layer concepts can acquire polarity towards output classes
Abstract
We study the evolution of latent space in fine-tuned NLP models. Different from the commonly used probing-framework, we opt for an unsupervised method to analyze representations. More specifically, we discover latent concepts in the representational space using hierarchical clustering. We then use an alignment function to gauge the similarity between the latent space of a pre-trained model and its fine-tuned version. We use traditional linguistic concepts to facilitate our understanding and also study how the model space transforms towards task-specific information. We perform a thorough analysis, comparing pre-trained and fine-tuned models across three models and three downstream tasks. The notable findings of our work are: i) the latent space of the higher layers evolve towards task-specific concepts, ii) whereas the lower layers retain generic concepts acquired in the pre-trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Adversarial Robustness in Machine Learning
MethodsOPT
