Fine-tuning GPT-3 for Russian Text Summarization

Alexandr Nikolich; Arina Puchkova

arXiv:2108.03502·cs.CL·August 10, 2021

Fine-tuning GPT-3 for Russian Text Summarization

Alexandr Nikolich, Arina Puchkova

PDF

Open Access

TL;DR

This paper demonstrates how fine-tuning GPT-3 for Russian text summarization can outperform existing models, despite some issues with factual accuracy and entity preservation.

Contribution

It shows that fine-tuning ruGPT-3 on Russian news data with hyperparameter tuning improves summarization performance without changing the model architecture.

Findings

01

Outperforms state-of-the-art models on Russian summarization tasks

02

Hyperparameter tuning reduces randomness and improves relevance

03

Model still struggles with entity accuracy and factual consistency

Abstract

Automatic summarization techniques aim to shorten and generalize information given in the text while preserving its core message and the most relevant ideas. This task can be approached and treated with a variety of methods, however, not many attempts have been made to produce solutions specifically for the Russian language despite existing localizations of the state-of-the-art models. In this paper, we aim to showcase ruGPT3 ability to summarize texts, fine-tuning it on the corpora of Russian news with their corresponding human-generated summaries. Additionally, we employ hyperparameter tuning so that the model's output becomes less random and more tied to the original text. We evaluate the resulting texts with a set of metrics, showing that our solution can surpass the state-of-the-art model's performance without additional changes in architecture or loss function. Despite being able…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies