$S^3$ -- Semantic Signal Separation

M\'arton Kardos; Jan Kostkan; Arnault-Quentin Vermillet; Kristoffer Nielbo; Kenneth Enevoldsen; Roberta Rocca

arXiv:2406.09556·cs.LG·May 20, 2025·1 cites

$S^3$ -- Semantic Signal Separation

M\'arton Kardos, Jan Kostkan, Arnault-Quentin Vermillet, Kristoffer Nielbo, Kenneth Enevoldsen, Roberta Rocca

PDF

Open Access 1 Repo

TL;DR

$S^3$ introduces a fast, theory-driven neural embedding-based topic modeling method that decomposes contextualized document embeddings into independent semantic axes, producing coherent topics without preprocessing.

Contribution

It presents $S^3$, a novel neural embedding-based topic modeling approach using ICA, which is faster and more coherent than existing methods, with no preprocessing required.

Findings

01

$S^3$ is on average 4.5x faster than BERTopic.

02

Produces highly coherent and diverse topics.

03

Requires no preprocessing of data.

Abstract

Topic models are useful tools for discovering latent semantic structures in large textual corpora. Recent efforts have been oriented at incorporating contextual representations in topic modeling and have been shown to outperform classical topic models. These approaches are typically slow, volatile, and require heavy preprocessing for optimal results. We present Semantic Signal Separation ( $S^{3}$ ), a theory-driven topic modeling approach in neural embedding spaces. $S^{3}$ conceptualizes topics as independent axes of semantic space and uncovers these by decomposing contextualized document embeddings using Independent Component Analysis. Our approach provides diverse and highly coherent topics, requires no preprocessing, and is demonstrated to be the fastest contextual topic model, being, on average, 4.5x faster than the runner-up BERTopic. We offer an implementation of $S^{3}$ , and all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

x-tabdeveloping/turftopic
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Blind Source Separation Techniques