NumHG: A Dataset for Number-Focused Headline Generation
Jian-Tao Huang, Chung-Chi Chen, Hen-Hsen Huang, Hsin-Hsi Chen

TL;DR
This paper introduces NumHG, a new dataset with over 27,000 numeral-rich news articles, to improve the accuracy of numerals in headline generation, highlighting current models' deficiencies and fostering future research.
Contribution
The paper provides the first dataset with fine-grained annotations for numeral accuracy in headlines, enabling detailed evaluation and advancement in number-focused headline generation.
Findings
Current models have low numerical accuracy in headlines.
Human evaluation shows room for improvement in numeral generation.
NumHG dataset can drive future research in numeral-focused summarization.
Abstract
Headline generation, a key task in abstractive summarization, strives to condense a full-length article into a succinct, single line of text. Notably, while contemporary encoder-decoder models excel based on the ROUGE metric, they often falter when it comes to the precise generation of numerals in headlines. We identify the lack of datasets providing fine-grained annotations for accurate numeral generation as a major roadblock. To address this, we introduce a new dataset, the NumHG, and provide over 27,000 annotated numeral-rich news articles for detailed investigation. Further, we evaluate five well-performing models from previous headline generation tasks using human evaluation in terms of numerical accuracy, reasonableness, and readability. Our study reveals a need for improvement in numerical accuracy, demonstrating the potential of the NumHG dataset to drive progress in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Mathematics, Computing, and Information Processing
