BotEval: Facilitating Interactive Human Evaluation

Hyundong Cho; Thamme Gowda; Yuyang Huang; Zixun Lu; Tianli Tong,; Jonathan May

arXiv:2407.17770·cs.CL·July 26, 2024

BotEval: Facilitating Interactive Human Evaluation

Hyundong Cho, Thamme Gowda, Yuyang Huang, Zixun Lu, Tianli Tong,, Jonathan May

PDF

Open Access 1 Video

TL;DR

BotEval is an open-source toolkit designed to facilitate interactive human evaluation of NLP models, enabling direct human-bot interactions to better assess performance on complex tasks like conversation moderation.

Contribution

It introduces a customizable, user-friendly evaluation toolkit that supports human-bot interactions and integrates with crowdsourcing platforms, filling a gap in existing evaluation methods.

Findings

01

BotEval effectively evaluates chatbot performance in conversational moderation.

02

The toolkit offers flexible templates for various interactive evaluation scenarios.

03

BotEval enhances the realism and reliability of human evaluations in NLP research.

Abstract

Following the rapid progress in natural language processing (NLP) models, language models are applied to increasingly more complex interactive tasks such as negotiations and conversation moderations. Having human evaluators directly interact with these NLP models is essential for adequately evaluating the performance on such interactive tasks. We develop BotEval, an easily customizable, open-source, evaluation toolkit that focuses on enabling human-bot interactions as part of the evaluation process, as opposed to human evaluators making judgements for a static input. BotEval balances flexibility for customization and user-friendliness by providing templates for common use cases that span various degrees of complexity and built-in compatibility with popular crowdsourcing platforms. We showcase the numerous useful features of BotEval through a study that evaluates the performance of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

BotEval: Facilitating Interactive Human Evaluation· underline

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Evacuation and Crowd Dynamics