Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained Text Evaluation
David Heineman, Yao Dou, Wei Xu

TL;DR
Thresh is a versatile platform enabling customizable, unified, and deployable fine-grained text evaluation with easy configuration, community sharing, and support for multiple NLP tasks and deployment scales.
Contribution
It introduces Thresh, a flexible platform that simplifies building, deploying, and sharing fine-grained annotation tools for various NLP evaluation tasks.
Findings
Supports rapid setup with a single YAML file
Provides a community hub for sharing annotation frameworks
Offers multiple deployment options for different project scales
Abstract
Fine-grained, span-level human evaluation has emerged as a reliable and robust method for evaluating text generation tasks such as summarization, simplification, machine translation and news generation, and the derived annotations have been useful for training automatic metrics and improving language models. However, existing annotation tools implemented for these evaluation frameworks lack the adaptability to be extended to different domains or languages, or modify annotation settings according to user needs; and, the absence of a unified annotated data format inhibits the research in multi-task learning. In this paper, we introduce Thresh, a unified, customizable and deployable platform for fine-grained evaluation. With a single YAML configuration file, users can build and test an annotation interface for any framework within minutes -- all in one web browser window. To facilitate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
MethodsLib
