Challenging Assumptions in Learning Generic Text Style Embeddings

Phil Ostheimer; Marius Kloft; Sophie Fellenz

arXiv:2501.16073·cs.LG·March 17, 2025

Challenging Assumptions in Learning Generic Text Style Embeddings

Phil Ostheimer, Marius Kloft, Sophie Fellenz

PDF

Open Access 1 Video

TL;DR

This paper investigates the creation of generic sentence-level style embeddings using contrastive learning, challenging assumptions about their ability to capture high-level text styles, and highlighting limitations in current methods.

Contribution

It introduces a novel approach to learning style embeddings by fine-tuning with contrastive learning, questioning the assumption that low-level style changes can represent high-level styles.

Findings

01

Low-level style shifts may not fully capture high-level text styles.

02

Contrastive fine-tuning improves style embedding quality.

03

Results challenge existing assumptions about style representation.

Abstract

Recent advancements in language representation learning primarily emphasize language modeling for deriving meaningful representations, often neglecting style-specific considerations. This study addresses this gap by creating generic, sentence-level style embeddings crucial for style-centric tasks. Our approach is grounded on the premise that low-level text style changes can compose any high-level style. We hypothesize that applying this concept to representation learning enables the development of versatile text style embeddings. By fine-tuning a general-purpose text encoder using contrastive learning and standard cross-entropy loss, we aim to capture these low-level style shifts, anticipating that they offer insights applicable to high-level text styles. The outcomes prompt us to reconsider the underlying assumptions as the results do not always show that the learned style…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Challenging Assumptions in Learning Generic Text Style Embeddings· underline

Taxonomy

TopicsTopic Modeling

MethodsContrastive Learning