# Analysing Data-To-Text Generation Benchmarks

**Authors:** Laura Perez-Beltrachini, Claire Gardent

arXiv: 1705.03802 · 2017-05-11

## TL;DR

This paper critically examines existing data-to-text benchmarks, highlighting their limitations in linguistic complexity and variety, and proposes criteria for more effective benchmark creation to advance surface realisation research.

## Contribution

It provides a detailed analysis of current data-sets, identifies their drawbacks, and offers criteria for designing better benchmarks to foster development of sophisticated data-to-text systems.

## Key findings

- Current data-sets lack linguistic variety and complexity.
- Manual evaluation reveals limitations in existing benchmarks.
- Proposed criteria aim to improve future data-to-text benchmarks.

## Abstract

Recently, several data-sets associating data to text have been created to train data-to-text surface realisers. It is unclear however to what extent the surface realisation task exercised by these data-sets is linguistically challenging. Do these data-sets provide enough variety to encourage the development of generic, high-quality data-to-text surface realisers ? In this paper, we argue that these data-sets have important drawbacks. We back up our claim using statistics, metrics and manual evaluation. We conclude by eliciting a set of criteria for the creation of a data-to-text benchmark which could help better support the development, evaluation and comparison of linguistically sophisticated data-to-text surface realisers.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.03802/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1705.03802/full.md

## References

11 references — full list in the complete paper: https://tomesphere.com/paper/1705.03802/full.md

---
Source: https://tomesphere.com/paper/1705.03802