Embedding Style Beyond Topics: Analyzing Dispersion Effects Across   Different Language Models

Benjamin Icard; Evangelia Zve; Lila Sainero; Alice Breton,; Jean-Gabriel Ganascia

arXiv:2501.00828·cs.CL·January 3, 2025

Embedding Style Beyond Topics: Analyzing Dispersion Effects Across Different Language Models

Benjamin Icard, Evangelia Zve, Lila Sainero, Alice Breton,, Jean-Gabriel Ganascia

PDF

Open Access 1 Repo

TL;DR

This paper investigates how writing style influences embedding dispersion in language models, revealing stylistic effects beyond topic modeling and enhancing understanding of model interpretability across languages.

Contribution

It introduces an analysis of stylistic effects on embedding dispersion in multiple language models, extending beyond traditional topic-based interpretations.

Findings

01

Style significantly affects embedding dispersion across models

02

Language models show different sensitivity to stylistic variations

03

Insights improve interpretability of language model representations

Abstract

This paper analyzes how writing style affects the dispersion of embedding vectors across multiple, state-of-the-art language models. While early transformer models primarily aligned with topic modeling, this study examines the role of writing style in shaping embedding spaces. Using a literary corpus that alternates between topics and styles, we compare the sensitivity of language models across French and English. By analyzing the particular impact of style on embedding dispersion, we aim to better understand how language models process stylistic information, contributing to their overall interpretability.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

evangeliazve/topic_style_embeddings_dispersion
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Natural Language Processing Techniques