The more polypersonal the better -- a short look on space geometry of fine-tuned layers
Sergei Kudriashov, Veronika Zykova, Angelina Stepanova, Yakov Raskind,, Eduard Klyshinsky

TL;DR
This paper investigates how adding grammatical modules to BERT influences its internal representations, revealing that such modifications help the model distinguish between different grammatical systems and enhance language understanding.
Contribution
It provides new insights into the space geometry of fine-tuned layers, showing how additional grammatical modules affect internal representations and model performance.
Findings
Adding a grammatical layer separates new and old grammatical systems within BERT.
The modification improves perplexity metrics on language tasks.
Internal space geometry reflects grammatical distinctions.
Abstract
The interpretation of deep learning models is a rapidly growing field, with particular interest in language models. There are various approaches to this task, including training simpler models to replicate neural network predictions and analyzing the latent space of the model. The latter method allows us to not only identify patterns in the model's decision-making process, but also understand the features of its internal structure. In this paper, we analyze the changes in the internal representation of the BERT model when it is trained with additional grammatical modules and data containing new grammatical structures (polypersonality). We find that adding a single grammatical layer causes the model to separate the new and old grammatical systems within itself, improving the overall performance on perplexity metrics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization · Dense Connections · Linear Warmup With Linear Decay · WordPiece · Attention Dropout · Adam · Residual Connection · Dropout · Softmax
