$S^3$ -- Semantic Signal Separation
M\'arton Kardos, Jan Kostkan, Arnault-Quentin Vermillet, Kristoffer Nielbo, Kenneth Enevoldsen, Roberta Rocca

TL;DR
$S^3$ introduces a fast, theory-driven neural embedding-based topic modeling method that decomposes contextualized document embeddings into independent semantic axes, producing coherent topics without preprocessing.
Contribution
It presents $S^3$, a novel neural embedding-based topic modeling approach using ICA, which is faster and more coherent than existing methods, with no preprocessing required.
Findings
$S^3$ is on average 4.5x faster than BERTopic.
Produces highly coherent and diverse topics.
Requires no preprocessing of data.
Abstract
Topic models are useful tools for discovering latent semantic structures in large textual corpora. Recent efforts have been oriented at incorporating contextual representations in topic modeling and have been shown to outperform classical topic models. These approaches are typically slow, volatile, and require heavy preprocessing for optimal results. We present Semantic Signal Separation (), a theory-driven topic modeling approach in neural embedding spaces. conceptualizes topics as independent axes of semantic space and uncovers these by decomposing contextualized document embeddings using Independent Component Analysis. Our approach provides diverse and highly coherent topics, requires no preprocessing, and is demonstrated to be the fastest contextual topic model, being, on average, 4.5x faster than the runner-up BERTopic. We offer an implementation of , and all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Blind Source Separation Techniques
