Advances of Transformer-Based Models for News Headline Generation

Alexey Bukhtiyarov; Ilya Gusev

arXiv:2007.05044·cs.CL·July 28, 2020

Advances of Transformer-Based Models for News Headline Generation

Alexey Bukhtiyarov, Ilya Gusev

PDF

2 Repos

TL;DR

This paper fine-tunes Transformer-based models for Russian news headline generation, achieving state-of-the-art results by leveraging pretrained models like mBART and BertSumAbs.

Contribution

It introduces the adaptation of pretrained Transformer models specifically for Russian news headline generation, setting new performance benchmarks.

Findings

01

BertSumAbs improves ROUGE scores by 2.9 and 2.0 points.

02

Achieved new state-of-the-art results on RIA and Lenta datasets.

03

Demonstrated effectiveness of pretrained models in language-specific summarization tasks.

Abstract

Pretrained language models based on Transformer architecture are the reason for recent breakthroughs in many areas of NLP, including sentiment analysis, question answering, named entity recognition. Headline generation is a special kind of text summarization task. Models need to have strong natural language understanding that goes beyond the meaning of individual words and sentences and an ability to distinguish essential information to succeed in it. In this paper, we fine-tune two pretrained Transformer-based models (mBART and BertSumAbs) for that task and achieve new state-of-the-art results on the RIA and Lenta datasets of Russian news. BertSumAbs increases ROUGE on average by 2.9 and 2.0 points respectively over previous best score achieved by Phrase-Based Attentional Transformer and CopyNet.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Softmax · Label Smoothing · Adam · Dense Connections · Byte Pair Encoding · Layer Normalization