Understanding The Robustness of Self-supervised Learning Through Topic Modeling
Zeping Luo, Shiyou Wu, Cindy Weng, Mo Zhou, Rong Ge

TL;DR
This paper investigates why self-supervised learning enhances NLP tasks by demonstrating its robustness and effectiveness in topic modeling, especially under model misspecification, through theoretical proofs and empirical comparisons.
Contribution
It provides a theoretical analysis showing self-supervised objectives can recover useful information in topic models and empirically outperforms traditional posterior inference under misspecification.
Findings
Self-supervised objectives recover useful posterior information.
Self-supervised learning performs on par with correct model inference.
Outperforms misspecified model inference.
Abstract
Self-supervised learning has significantly improved the performance of many NLP tasks. However, how can self-supervised learning discover useful representations, and why is it better than traditional approaches such as probabilistic models are still largely unknown. In this paper, we focus on the context of topic modeling and highlight a key advantage of self-supervised learning - when applied to data generated by topic models, self-supervised learning can be oblivious to the specific model, and hence is less susceptible to model misspecification. In particular, we prove that commonly used self-supervised objectives based on reconstruction or contrastive samples can both recover useful posterior information for general topic models. Empirically, we show that the same objectives can perform on par with posterior inference using the correct model, while outperforming posterior inference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Natural Language Processing Techniques
