Multi-Task Learning for Cross-Lingual Abstractive Summarization
Sho Takase, Naoaki Okazaki

TL;DR
This paper introduces Transum, a multi-task learning framework that leverages genuine translation and summarization data to improve cross-lingual abstractive summarization and translation performance.
Contribution
The paper presents Transum, a novel method that incorporates genuine data into training via task-specific tokens, enhancing cross-lingual summarization and translation results.
Findings
Transum outperforms models trained only on pseudo data.
Achieves top ROUGE scores on Chinese-English and Arabic-English summarization.
Improves translation performance across multiple language pairs.
Abstract
We present a multi-task learning framework for cross-lingual abstractive summarization to augment training data. Recent studies constructed pseudo cross-lingual abstractive summarization data to train their neural encoder-decoders. Meanwhile, we introduce existing genuine data such as translation pairs and monolingual abstractive summarization data into training. Our proposed method, Transum, attaches a special token to the beginning of the input sentence to indicate the target task. The special token enables us to incorporate the genuine data into the training data easily. The experimental results show that Transum achieves better performance than the model trained with only pseudo cross-lingual summarization data. In addition, we achieve the top ROUGE score on Chinese-English and Arabic-English abstractive summarization. Moreover, Transum also has a positive effect on machine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Layer Normalization · Dense Connections · Multi-Head Attention · Label Smoothing
