Evaluate & Evaluation on the Hub: Better Best Practices for Data and   Model Measurements

Leandro von Werra; Lewis Tunstall; Abhishek Thakur; Alexandra Sasha; Luccioni; Tristan Thrush; Aleksandra Piktus; Felix Marty; Nazneen Rajani,; Victor Mustar; Helen Ngo; Omar Sanseviero; Mario \v{S}a\v{s}ko; Albert; Villanova; Quentin Lhoest; Julien Chaumond; Margaret Mitchell; Alexander M.; Rush; Thomas Wolf; Douwe Kiela

arXiv:2210.01970·cs.LG·October 7, 2022·5 cites

Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements

Leandro von Werra, Lewis Tunstall, Abhishek Thakur, Alexandra Sasha, Luccioni, Tristan Thrush, Aleksandra Piktus, Felix Marty, Nazneen Rajani,, Victor Mustar, Helen Ngo, Omar Sanseviero, Mario \v{S}a\v{s}ko, Albert, Villanova, Quentin Lhoest, Julien Chaumond, Margaret Mitchell

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper presents 'Evaluate' and 'Evaluation on the Hub', tools designed to improve the systematic, reproducible, and comprehensive evaluation of machine learning models and datasets, supporting large-scale, accessible benchmarking.

Contribution

It introduces a library with over 50 evaluation implementations and a platform for large-scale, automated evaluation of models and datasets on Hugging Face Hub.

Findings

01

Supports reproducibility and documentation of evaluations

02

Enables large-scale evaluation of over 75,000 models and 11,000 datasets

03

Provides a comprehensive set of evaluation tools and benchmarks

Abstract

Evaluation is a key part of machine learning (ML), yet there is a lack of support and tooling to enable its informed and systematic practice. We introduce Evaluate and Evaluation on the Hub --a set of tools to facilitate the evaluation of models and datasets in ML. Evaluate is a library to support best practices for measurements, metrics, and comparisons of data and models. Its goal is to support reproducibility of evaluation, centralize and document the evaluation process, and broaden evaluation to cover more facets of model performance. It includes over 50 efficient canonical implementations for a variety of domains and scenarios, interactive documentation, and the ability to easily share implementations and outcomes. The library is available at https://github.com/huggingface/evaluate. In addition, we introduce Evaluation on the Hub, a platform that enables the large-scale evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huggingface/evaluate
pytorchOfficial

Datasets

society-ethics/papers
dataset· 44 dl
44 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · Machine Learning in Materials Science

MethodsLib