Efficiency Metrics for Data-Driven Models: A Text Summarization Case Study
Erion \c{C}ano, Ond\v{r}ej Bojar

TL;DR
This paper introduces three data efficiency metrics to evaluate data-driven models in text summarization and title generation, revealing that Transformer models are the most efficient among those tested.
Contribution
It proposes novel data efficiency metrics and applies them to a large dataset, providing a more comprehensive evaluation of model performance beyond accuracy scores.
Findings
Transformers are the most data-efficient models tested.
New metrics offer deeper insights into model learning capabilities.
Large dataset of 35 million scientific article pairs was processed and released.
Abstract
Using data-driven models for solving text summarization or similar tasks has become very common in the last years. Yet most of the studies report basic accuracy scores only, and nothing is known about the ability of the proposed models to improve when trained on more data. In this paper, we define and propose three data efficiency metrics: data score efficiency, data time deficiency and overall data efficiency. We also propose a simple scheme that uses those metrics and apply it for a more comprehensive evaluation of popular methods on text summarization and title generation tasks. For the latter task, we process and release a huge collection of 35 million abstract-title pairs from scientific articles. Our results reveal that among the tested models, the Transformer is the most efficient on both tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
