Statistical Topic Models for Multi-Label Document Classification
Timothy N. Rubin, America Chambers, Padhraic Smyth, Mark Steyvers

TL;DR
This paper explores generative statistical topic models for multi-label document classification, showing they perform competitively with discriminative methods, especially on datasets with many and infrequent labels.
Contribution
It introduces a generative modeling approach for multi-label classification and demonstrates its advantages over discriminative models on large, skewed label datasets.
Findings
Generative models achieve competitive accuracy with discriminative methods.
Probabilistic models perform better on datasets with many and rare labels.
Generative approach is effective for large-scale, skewed label distributions.
Abstract
Machine learning approaches to multi-label document classification have to date largely relied on discriminative modeling techniques such as support vector machines. A drawback of these approaches is that performance rapidly drops off as the total number of labels and the number of labels per document increase. This problem is amplified when the label frequencies exhibit the type of highly skewed distributions that are often observed in real-world datasets. In this paper we investigate a class of generative statistical topic models for multi-label documents that associate individual word tokens with different labels. We investigate the advantages of this approach relative to discriminative models, particularly with respect to classification problems involving large numbers of relatively rare labels. We compare the performance of generative and discriminative approaches on document…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Computational and Text Analysis Methods
