Understanding Cross-Domain Adaptation in Low-Resource Topic Modeling
Pritom Saha Akash, Kevin Chen-Chuan Chang

TL;DR
This paper introduces DALTA, a novel domain adaptation framework for low-resource topic modeling that leverages shared representations and adversarial alignment to improve coherence and stability.
Contribution
It formally defines the problem of domain adaptation in low-resource topic modeling and proposes DALTA, a new method employing shared encoders and adversarial training for effective transfer.
Findings
DALTA outperforms existing methods in coherence and stability.
Theoretical bounds relate transfer effectiveness to domain performance.
Experiments show improved transferability across diverse datasets.
Abstract
Topic modeling plays a vital role in uncovering hidden semantic structures within text corpora, but existing models struggle in low-resource settings where limited target-domain data leads to unstable and incoherent topic inference. We address this challenge by formally introducing domain adaptation for low-resource topic modeling, where a high-resource source domain informs a low-resource target domain without overwhelming it with irrelevant content. We establish a finite-sample generalization bound showing that effective knowledge transfer depends on robust performance in both domains, minimizing latent-space discrepancy, and preventing overfitting to the data. Guided by these insights, we propose DALTA (Domain-Aligned Latent Topic Adaptation), a new framework that employs a shared encoder for domain-invariant features, specialized decoders for domain-specific nuances, and adversarial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Domain Adaptation and Few-Shot Learning
