Understanding Cross-Domain Adaptation in Low-Resource Topic Modeling

Pritom Saha Akash; Kevin Chen-Chuan Chang

arXiv:2506.07453·cs.CL·June 10, 2025

Understanding Cross-Domain Adaptation in Low-Resource Topic Modeling

Pritom Saha Akash, Kevin Chen-Chuan Chang

PDF

Open Access 1 Video

TL;DR

This paper introduces DALTA, a novel domain adaptation framework for low-resource topic modeling that leverages shared representations and adversarial alignment to improve coherence and stability.

Contribution

It formally defines the problem of domain adaptation in low-resource topic modeling and proposes DALTA, a new method employing shared encoders and adversarial training for effective transfer.

Findings

01

DALTA outperforms existing methods in coherence and stability.

02

Theoretical bounds relate transfer effectiveness to domain performance.

03

Experiments show improved transferability across diverse datasets.

Abstract

Topic modeling plays a vital role in uncovering hidden semantic structures within text corpora, but existing models struggle in low-resource settings where limited target-domain data leads to unstable and incoherent topic inference. We address this challenge by formally introducing domain adaptation for low-resource topic modeling, where a high-resource source domain informs a low-resource target domain without overwhelming it with irrelevant content. We establish a finite-sample generalization bound showing that effective knowledge transfer depends on robust performance in both domains, minimizing latent-space discrepancy, and preventing overfitting to the data. Guided by these insights, we propose DALTA (Domain-Aligned Latent Topic Adaptation), a new framework that employs a shared encoder for domain-invariant features, specialized decoders for domain-specific nuances, and adversarial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Understanding Cross-Domain Adaptation in Low-Resource Topic Modeling· underline

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Domain Adaptation and Few-Shot Learning