LeQua@CLEF2022: Learning to Quantify

Andrea Esuli; Alejandro Moreo; Fabrizio Sebastiani

arXiv:2111.11249·cs.LG·December 14, 2021

LeQua@CLEF2022: Learning to Quantify

Andrea Esuli, Alejandro Moreo, Fabrizio Sebastiani

PDF

Open Access

TL;DR

LeQua 2022 introduces a new evaluation framework for learning to quantify in textual datasets, emphasizing the development and comparison of methods that directly estimate class frequencies rather than relying on classification followed by counting.

Contribution

This paper presents a new lab for the evaluation of learning to quantify methods, providing datasets and a standardized setting for comparison in binary and multiclass scenarios.

Findings

01

Established a benchmark for learning to quantify methods.

02

Provided datasets in vector and raw document formats.

03

Facilitated comparative evaluation of different quantification techniques.

Abstract

LeQua 2022 is a new lab for the evaluation of methods for "learning to quantify" in textual datasets, i.e., for training predictors of the relative frequencies of the classes of interest in sets of unlabelled textual documents. While these predictions could be easily achieved by first classifying all documents via a text classifier and then counting the numbers of documents assigned to the classes, a growing body of literature has shown this approach to be suboptimal, and has proposed better methods. The goal of this lab is to provide a setting for the comparative evaluation of methods for learning to quantify, both in the binary setting and in the single-label multiclass setting. For each such setting we provide data either in ready-made vector form or in raw document form.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Natural Language Processing Techniques · Machine Learning and Data Classification