Reproducible Subjective Evaluation

Max Morrison; Brian Tang; Gefei Tan; and Bryan Pardo

arXiv:2203.04444·cs.HC·March 10, 2022·1 cites

Reproducible Subjective Evaluation

Max Morrison, Brian Tang, Gefei Tan, and Bryan Pardo

PDF

Open Access 1 Repo

TL;DR

ReSEval is an open-source framework that simplifies the deployment and reproducibility of crowdsourced subjective evaluations across various data modalities, integrating seamlessly with Python and enabling consistent, detailed reporting.

Contribution

It introduces ReSEval, a novel tool that streamlines the setup, execution, and sharing of subjective evaluation studies, enhancing reproducibility and ease of use in research.

Findings

01

Facilitates reproducible subjective evaluations

02

Supports multiple test types and data modalities

03

Integrates with Python for ease of use

Abstract

Human perceptual studies are the gold standard for the evaluation of many research tasks in machine learning, linguistics, and psychology. However, these studies require significant time and cost to perform. As a result, many researchers use objective measures that can correlate poorly with human evaluation. When subjective evaluations are performed, they are often not reported with sufficient detail to ensure reproducibility. We propose Reproducible Subjective Evaluation (ReSEval), an open-source framework for quickly deploying crowdsourced subjective evaluations directly from Python. ReSEval lets researchers launch A/B, ABX, Mean Opinion Score (MOS) and MUltiple Stimuli with Hidden Reference and Anchor (MUSHRA) tests on audio, image, text, or video data from a command-line interface or using one line of Python, making it as easy to run as objective evaluation. With ReSEval,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

reseval/reseval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Data Visualization and Analytics