CoTK: An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation
Fei Huang, Dazhen Wan, Zhihong Shao, Pei Ke, Jian Guan, Yilin Niu,, Xiaoyan Zhu, Minlie Huang

TL;DR
CoTK is an open-source toolkit designed to streamline and standardize the development and evaluation of text generation models, ensuring fair comparisons and reducing human errors across different experimental setups.
Contribution
It introduces a comprehensive toolkit that handles data processing, metric implementation, and reproducibility, addressing common issues in text generation evaluation.
Findings
Facilitates consistent experimental settings
Provides implementation for multiple metrics and benchmarks
Identifies when metrics cannot be fairly compared
Abstract
In text generation evaluation, many practical issues, such as inconsistent experimental settings and metric implementations, are often ignored but lead to unfair evaluation and untenable conclusions. We present CoTK, an open-source toolkit aiming to support fast development and fair evaluation of text generation. In model development, CoTK helps handle the cumbersome issues, such as data processing, metric implementation, and reproduction. It standardizes the development steps and reduces human errors which may lead to inconsistent experimental settings. In model evaluation, CoTK provides implementation for many commonly used metrics and benchmark models across different experimental settings. As a unique feature, CoTK can signify when and which metric cannot be fairly compared. We demonstrate that it is convenient to use CoTK for model development and evaluation, particularly across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
