E2E Refined Dataset
Keisuke Toyama, Katsuhito Sudoh, Satoshi Nakamura

TL;DR
This paper introduces a refined version of the E2E dataset to improve the quality of MR-to-text systems by correcting errors in the original dataset's pairs.
Contribution
It presents a new refined dataset and associated Python tools to fix errors in the original E2E dataset, enhancing data quality for research.
Findings
Refined dataset reduces errors in MR-text pairs.
Python tools facilitate dataset correction.
Improved data quality benefits MR-to-text system development.
Abstract
Although the well-known MR-to-text E2E dataset has been used by many researchers, its MR-text pairs include many deletion/insertion/substitution errors. Since such errors affect the quality of MR-to-text systems, they must be fixed as much as possible. Therefore, we developed a refined dataset and some python programs that convert the original E2E dataset into a refined dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Algorithms and Data Compression · Machine Learning in Bioinformatics
