Mark-Evaluate: Assessing Language Generation using Population Estimation Methods
Gon\c{c}alo Mordido, Christoph Meinel

TL;DR
This paper introduces a new family of language generation evaluation metrics based on ecological population estimation techniques, showing improved correlation with human judgment across multiple NLP tasks.
Contribution
It presents three novel metrics derived from mark-recapture and maximum-likelihood methods, offering more nuanced assessments of quality and diversity in language generation.
Findings
Metrics are sensitive to quality and diversity drops.
Higher correlation with human evaluation than existing metrics.
Effective across tasks like translation and summarization.
Abstract
We propose a family of metrics to assess language generation derived from population estimation methods widely used in ecology. More specifically, we use mark-recapture and maximum-likelihood methods that have been applied over the past several decades to estimate the size of closed populations in the wild. We propose three novel metrics: ME and ME, which retrieve a single-valued assessment, and ME which returns a double-valued metric to assess the evaluation set in terms of quality and diversity, separately. In synthetic experiments, our family of methods is sensitive to drops in quality and diversity. Moreover, our methods show a higher correlation to human evaluation than existing metrics on several challenging tasks, namely unconditional language generation, machine translation, and text summarization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
