X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents

Sotaro Takeshita; Tommaso Green; Niklas Friedrich; Kai Eckert and; Simone Paolo Ponzetto

arXiv:2205.15051·cs.CL·May 31, 2022

X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents

Sotaro Takeshita, Tommaso Green, Niklas Friedrich, Kai Eckert and, Simone Paolo Ponzetto

PDF

1 Repo

TL;DR

This paper introduces a new multilingual dataset and benchmarks for cross-lingual scholarly document summarization, enabling models to generate summaries in multiple languages from English papers.

Contribution

It presents the X-SCITLDR dataset and evaluates various models, including a novel two-stage approach, for cross-lingual summarization in scholarly texts.

Findings

01

Multilingual models outperform monolingual baselines.

02

Intermediate training improves cross-lingual performance.

03

Zero- and few-shot scenarios show promising results.

Abstract

The number of scientific publications nowadays is rapidly increasing, causing information overload for researchers and making it hard for scholars to keep up to date with current trends and lines of work. Consequently, recent work on applying text mining technologies for scholarly publications has investigated the application of automatic text summarization technologies, including extreme summarization, for this domain. However, previous work has concentrated only on monolingual settings, primarily in English. In this paper, we fill this research gap and present an abstractive cross-lingual summarization dataset for four different languages in the scholarly domain, which enables us to train and evaluate models that process English papers and generate summaries in German, Italian, Chinese and Japanese. We present our new X-SCITLDR dataset for multilingual summarization and thoroughly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sobamchan/xscitldr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.