ARMAN: Pre-training with Semantically Selecting and Reordering of Sentences for Persian Abstractive Summarization
Alireza Salemi, Emad Kebriaei, Ghazal Neisi Minaei, Azadeh Shakery

TL;DR
This paper introduces ARMAN, a novel pre-training model for Persian abstractive summarization that emphasizes semantic similarity and sentence reordering, achieving state-of-the-art results across multiple tasks.
Contribution
The paper proposes ARMAN, a Transformer-based pre-training approach with three novel objectives focusing on semantic relevance and sentence reordering for improved Persian summarization.
Findings
Achieves state-of-the-art ROUGE and BERTScore on six Persian summarization tasks.
Outperforms previous models in textual entailment, question paraphrasing, and multiple choice question answering.
Semantic score-based selection significantly enhances summarization quality.
Abstract
Abstractive text summarization is one of the areas influenced by the emergence of pre-trained language models. Current pre-training works in abstractive summarization give more points to the summaries with more words in common with the main text and pay less attention to the semantic similarity between generated sentences and the original document. We propose ARMAN, a Transformer-based encoder-decoder model pre-trained with three novel objectives to address this issue. In ARMAN, salient sentences from a document are selected according to a modified semantic score to be masked and form a pseudo summary. To summarize more accurately and similar to human writing patterns, we applied modified sentence reordering. We evaluated our proposed models on six downstream Persian summarization tasks. Experimental results show that our proposed model achieves state-of-the-art performance on all six…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
