Transformers for Headline Selection for Russian News Clusters
Pavel Voropaev, Olga Sopilnyak

TL;DR
This paper investigates transformer-based models for selecting headlines in Russian news clusters, demonstrating that combined multilingual and monolingual approaches outperform individual models, achieving over 86% accuracy.
Contribution
It introduces a combined transformer approach for headline selection in Russian news, analyzing sentence embedding methods and ranking models, with superior performance.
Findings
Combined models outperform individual models.
Achieved 87.28% accuracy on public test set.
Analyzed various sentence embedding and ranking techniques.
Abstract
In this paper, we explore various multilingual and Russian pre-trained transformer-based models for the Dialogue Evaluation 2021 shared task on headline selection. Our experiments show that the combined approach is superior to individual multilingual and monolingual models. We present an analysis of a number of ways to obtain sentence embeddings and learn a ranking model on top of them. We achieve the result of 87.28% and 86.60% accuracy for the public and private test sets respectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems
