TL;DR
VizSeq is a versatile visual analysis toolkit designed for detailed, instance-level and corpus-level evaluation of text generation systems across various tasks, supporting multimodal data and multiple metrics.
Contribution
It introduces a comprehensive, user-friendly toolkit that visualizes evaluation metrics and error patterns for text generation, integrating both traditional and embedding-based metrics.
Findings
Supports multimodal sources and multiple references
Provides visualization in Jupyter and web interfaces
Includes both n-gram and embedding-based metrics
Abstract
Automatic evaluation of text generation tasks (e.g. machine translation, text summarization, image captioning and video description) usually relies heavily on task-specific metrics, such as BLEU and ROUGE. They, however, are abstract numbers and are not perfectly aligned with human assessment. This suggests inspecting detailed examples as a complement to identify system error patterns. In this paper, we present VizSeq, a visual analysis toolkit for instance-level and corpus-level system evaluation on a wide variety of text generation tasks. It supports multimodal sources and multiple text references, providing visualization in Jupyter notebook or a web app interface. It can be used locally or deployed onto public servers for centralized data hosting and benchmarking. It covers most common n-gram based metrics accelerated with multiprocessing, and also provides latest embedding-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
