Optimizing Text Quantifiers for Multivariate Loss Functions

Andrea Esuli; Fabrizio Sebastiani

arXiv:1502.05491·cs.LG·September 21, 2021

Optimizing Text Quantifiers for Multivariate Loss Functions

Andrea Esuli, Fabrizio Sebastiani

PDF

Open Access

TL;DR

This paper introduces a novel supervised structured prediction approach for quantification, directly optimizing for accuracy in estimating class prevalence, outperforming traditional methods on large high-dimensional datasets.

Contribution

It proposes a new method that directly optimizes quantification accuracy using structured prediction, moving beyond traditional classifier-based approaches.

Findings

01

More accurate quantification results

02

Greater stability across datasets

03

Enhanced computational efficiency

Abstract

We address the problem of \emph{quantification}, a supervised learning task whose goal is, given a class, to estimate the relative frequency (or \emph{prevalence}) of the class in a dataset of unlabelled items. Quantification has several applications in data and text mining, such as estimating the prevalence of positive reviews in a set of reviews of a given product, or estimating the prevalence of a given support issue in a dataset of transcripts of phone calls to tech support. So far, quantification has been addressed by learning a general-purpose classifier, counting the unlabelled items which have been assigned the class, and tuning the obtained counts according to some heuristics. In this paper we depart from the tradition of using general-purpose classifiers, and use instead a supervised learning model for \emph{structured prediction}, capable of generating classifiers directly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Machine Learning and Data Classification · Biomedical Text Mining and Ontologies