Optimizing Text Quantifiers for Multivariate Loss Functions
Andrea Esuli, Fabrizio Sebastiani

TL;DR
This paper introduces a novel supervised structured prediction approach for quantification, directly optimizing for accuracy in estimating class prevalence, outperforming traditional methods on large high-dimensional datasets.
Contribution
It proposes a new method that directly optimizes quantification accuracy using structured prediction, moving beyond traditional classifier-based approaches.
Findings
More accurate quantification results
Greater stability across datasets
Enhanced computational efficiency
Abstract
We address the problem of \emph{quantification}, a supervised learning task whose goal is, given a class, to estimate the relative frequency (or \emph{prevalence}) of the class in a dataset of unlabelled items. Quantification has several applications in data and text mining, such as estimating the prevalence of positive reviews in a set of reviews of a given product, or estimating the prevalence of a given support issue in a dataset of transcripts of phone calls to tech support. So far, quantification has been addressed by learning a general-purpose classifier, counting the unlabelled items which have been assigned the class, and tuning the obtained counts according to some heuristics. In this paper we depart from the tradition of using general-purpose classifiers, and use instead a supervised learning model for \emph{structured prediction}, capable of generating classifiers directly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Machine Learning and Data Classification · Biomedical Text Mining and Ontologies
