An Enhanced Latent Semantic Analysis Approach for Arabic Document Summarization
Kamal Al-Sabahi, Zuping Zhang, Jun Long, Khaled Alwesabi

TL;DR
This paper introduces an enhanced LSA-based method for Arabic document summarization that incorporates syntactic and semantic processing, leading to more informative and diverse summaries with improved performance over existing methods.
Contribution
The work proposes a novel LSA-based summarization approach that combines statistical, linear algebraic, syntactic, and semantic techniques, including a new sentence selection algorithm.
Findings
Outperforms state-of-the-art methods on Arabic and English datasets.
Effective in producing more informative and diverse summaries.
Validated on four different datasets with significant improvements.
Abstract
The fast-growing amount of information on the Internet makes the research in automatic document summarization very urgent. It is an effective solution for information overload. Many approaches have been proposed based on different strategies, such as latent semantic analysis (LSA). However, LSA, when applied to document summarization, has some limitations which diminish its performance. In this work, we try to overcome these limitations by applying statistic and linear algebraic approaches combined with syntactic and semantic processing of text. First, the part of speech tagger is utilized to reduce the dimension of LSA. Then, the weight of the term in four adjacent sentences is added to the weighting schemes while calculating the input matrix to take into account the word order and the syntactic relations. In addition, a new LSA-based sentence selection algorithm is proposed, in which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
