TL;DR
This paper explores various fine-tuning strategies for transformer encoders to enhance neural topic modeling in monolingual and zero-shot polylingual contexts, demonstrating improved topic quality and cross-lingual transfer.
Contribution
It introduces multiple fine-tuning methods for encoders, including auxiliary tasks and integrated training, to improve neural topic models for multiple languages.
Findings
Fine-tuning on topic classification improves topic quality.
Integrating topic classification into training enhances cross-lingual transfer.
Any task-based fine-tuning significantly boosts model performance.
Abstract
Neural topic models can augment or replace bag-of-words inputs with the learned representations of deep pre-trained transformer-based word prediction models. One added benefit when using representations from multilingual models is that they facilitate zero-shot polylingual topic modeling. However, while it has been widely observed that pre-trained embeddings should be fine-tuned to a given task, it is not immediately clear what supervision should look like for an unsupervised task such as topic modeling. Thus, we propose several methods for fine-tuning encoders to improve both monolingual and zero-shot polylingual neural topic modeling. We consider fine-tuning on auxiliary tasks, constructing a new topic classification task, integrating the topic classification objective directly into topic model training, and continued pre-training. We find that fine-tuning encoder representations on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · Layer Normalization · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay
