Jury: A Comprehensive Evaluation Toolkit

Devrim Cavusoglu; Secil Sen; Ulas Sert; Sinan Altinuc

arXiv:2310.02040·cs.CL·May 21, 2024

Jury: A Comprehensive Evaluation Toolkit

Devrim Cavusoglu, Secil Sen, Ulas Sert, Sinan Altinuc

PDF

Open Access 1 Repo

TL;DR

Jury is an open-source toolkit designed to standardize and streamline the evaluation process across various NLP tasks and metrics, addressing the fragmentation in current evaluation practices.

Contribution

It introduces a unified framework for evaluation in NLP, facilitating consistent and comprehensive assessment across diverse tasks and metrics.

Findings

01

Widespread adoption of jury since release

02

Improved consistency in NLP system evaluations

03

Facilitated comparison across different NLP models

Abstract

Evaluation plays a critical role in deep learning as a fundamental block of any prediction-based system. However, the vast number of Natural Language Processing (NLP) tasks and the development of various metrics have led to challenges in evaluating different systems with different metrics. To address these challenges, we introduce jury, a toolkit that provides a unified evaluation framework with standardized structures for performing evaluation across different tasks and metrics. The objective of jury is to standardize and improve metric evaluation for all systems and aid the community in overcoming the challenges in evaluation. Since its open-source release, jury has reached a wide audience and is available at https://github.com/obss/jury.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

obss/jury
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques