Is it worth it? Budget-related evaluation metrics for model selection

Filip Klubi\v{c}ka; Giancarlo D. Salton; John D. Kelleher

arXiv:1807.06998·cs.CL·July 19, 2018·1 cites

Is it worth it? Budget-related evaluation metrics for model selection

Filip Klubi\v{c}ka, Giancarlo D. Salton, John D. Kelleher

PDF

Open Access

TL;DR

This paper emphasizes the importance of using budget-related gain metrics, alongside traditional evaluation scores like F-score, for selecting models in resource-constrained linguistic annotation tasks, to maximize cost-effectiveness.

Contribution

It introduces a cost-benefit evaluation framework for model selection that considers budget constraints and demonstrates its effectiveness through a case study on idiom dictionary building.

Findings

01

Higher F-score does not always mean higher profit in annotation tasks.

02

Budget-aware metrics can lead to better model choices for resource optimization.

03

A lower F-score system can outperform higher F-score systems in cost-benefit terms.

Abstract

Creating a linguistic resource is often done by using a machine learning model that filters the content that goes through to a human annotator, before going into the final resource. However, budgets are often limited, and the amount of available data exceeds the amount of affordable annotation. In order to optimize the benefit from the invested human work, we argue that deciding on which model one should employ depends not only on generalized evaluation metrics such as F-score, but also on the gain metric. Because the model with the highest F-score may not necessarily have the best sequencing of predicted classes, this may lead to wasting funds on annotating false positives, yielding zero improvement of the linguistic resource. We exemplify our point with a case study, using real data from a task of building a verb-noun idiom dictionary. We show that, given the choice of three systems…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification