Self-Attentive Model for Headline Generation

Daniil Gavrilov; Pavel Kalaidin; Valentin Malykh

arXiv:1901.07786·cs.CL·January 24, 2019·5 cites

Self-Attentive Model for Headline Generation

Daniil Gavrilov, Pavel Kalaidin, Valentin Malykh

PDF

Open Access 2 Datasets

TL;DR

This paper introduces a self-attentive Universal Transformer model with byte-pair encoding for headline generation, achieving state-of-the-art results on multiple news corpora by enhancing natural language reasoning capabilities.

Contribution

The paper presents a novel application of Universal Transformer architecture combined with byte-pair encoding to improve headline generation performance.

Findings

01

Achieved new state-of-the-art ROUGE scores on the New York Times corpus.

02

Introduced the RIA corpus for headline generation evaluation.

03

Demonstrated improved reasoning in headline generation models.

Abstract

Headline generation is a special type of text summarization task. While the amount of available training data for this task is almost unlimited, it still remains challenging, as learning to generate headlines for news articles implies that the model has strong reasoning about natural language. To overcome this issue, we applied recent Universal Transformer architecture paired with byte-pair encoding technique and achieved new state-of-the-art results on the New York Times Annotated corpus with ROUGE-L F1-score 24.84 and ROUGE-2 F1-score 13.48. We also present the new RIA corpus and reach ROUGE-L F1-score 36.81 and ROUGE-2 F1-score 22.15 on it.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Attention Dropout · Universal Transformer · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia?