Learning Supervised Topic Models for Classification and Regression from Crowds
Filipe Rodrigues, Mariana Louren\c{c}o, Bernardete Ribeiro, Francisco, Pereira

TL;DR
This paper introduces two supervised topic models designed for classification and regression tasks that effectively handle annotator heterogeneity and noise in crowdsourced data, with scalable inference algorithms.
Contribution
The paper presents novel supervised topic models that explicitly model annotator biases and heterogeneity, improving learning from noisy crowdsourced annotations.
Findings
Models outperform state-of-the-art methods in empirical evaluations.
Scalable stochastic variational inference enables handling large datasets.
Effectively accounts for annotator biases and noise in crowdsourced annotations.
Abstract
The growing need to analyze large collections of documents has led to great developments in topic modeling. Since documents are frequently associated with other related variables, such as labels or ratings, much interest has been placed on supervised topic models. However, the nature of most annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this article, we propose two supervised topic models, one for classification and another for regression problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds. We develop an efficient stochastic variational inference algorithm that is able to scale to very large datasets, and we empirically demonstrate the advantages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
