Reproducible Subjective Evaluation
Max Morrison, Brian Tang, Gefei Tan, and Bryan Pardo

TL;DR
ReSEval is an open-source framework that simplifies the deployment and reproducibility of crowdsourced subjective evaluations across various data modalities, integrating seamlessly with Python and enabling consistent, detailed reporting.
Contribution
It introduces ReSEval, a novel tool that streamlines the setup, execution, and sharing of subjective evaluation studies, enhancing reproducibility and ease of use in research.
Findings
Facilitates reproducible subjective evaluations
Supports multiple test types and data modalities
Integrates with Python for ease of use
Abstract
Human perceptual studies are the gold standard for the evaluation of many research tasks in machine learning, linguistics, and psychology. However, these studies require significant time and cost to perform. As a result, many researchers use objective measures that can correlate poorly with human evaluation. When subjective evaluations are performed, they are often not reported with sufficient detail to ensure reproducibility. We propose Reproducible Subjective Evaluation (ReSEval), an open-source framework for quickly deploying crowdsourced subjective evaluations directly from Python. ReSEval lets researchers launch A/B, ABX, Mean Opinion Score (MOS) and MUltiple Stimuli with Hidden Reference and Anchor (MUSHRA) tests on audio, image, text, or video data from a command-line interface or using one line of Python, making it as easy to run as objective evaluation. With ReSEval,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Data Visualization and Analytics
