The more polypersonal the better -- a short look on space geometry of   fine-tuned layers

Sergei Kudriashov; Veronika Zykova; Angelina Stepanova; Yakov Raskind,; Eduard Klyshinsky

arXiv:2501.05503·cs.CL·January 13, 2025

The more polypersonal the better -- a short look on space geometry of fine-tuned layers

Sergei Kudriashov, Veronika Zykova, Angelina Stepanova, Yakov Raskind,, Eduard Klyshinsky

PDF

TL;DR

This paper investigates how adding grammatical modules to BERT influences its internal representations, revealing that such modifications help the model distinguish between different grammatical systems and enhance language understanding.

Contribution

It provides new insights into the space geometry of fine-tuned layers, showing how additional grammatical modules affect internal representations and model performance.

Findings

01

Adding a grammatical layer separates new and old grammatical systems within BERT.

02

The modification improves perplexity metrics on language tasks.

03

Internal space geometry reflects grammatical distinctions.

Abstract

The interpretation of deep learning models is a rapidly growing field, with particular interest in language models. There are various approaches to this task, including training simpler models to replicate neural network predictions and analyzing the latent space of the model. The latter method allows us to not only identify patterns in the model's decision-making process, but also understand the features of its internal structure. In this paper, we analyze the changes in the internal representation of the BERT model when it is trained with additional grammatical modules and data containing new grammatical structures (polypersonality). We find that adding a single grammatical layer causes the model to separate the new and old grammatical systems within itself, improving the overall performance on perplexity metrics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization · Dense Connections · Linear Warmup With Linear Decay · WordPiece · Attention Dropout · Adam · Residual Connection · Dropout · Softmax