Corpus-based Web Document Summarization using Statistical and Linguistic   Approach

Rushdi Shams; M.M.A. Hashem; Afrina Hossain; Suraiya Rumana Akter; and; Monika Gope

arXiv:1304.2476·cs.IR·November 17, 2016

Corpus-based Web Document Summarization using Statistical and Linguistic Approach

Rushdi Shams, M.M.A. Hashem, Afrina Hossain, Suraiya Rumana Akter, and, Monika Gope

PDF

TL;DR

This paper introduces a novel corpus-based method combining statistical and linguistic analysis for single web document summarization, achieving 68% similarity with manual summaries.

Contribution

It proposes a new summarization technique using Sentence Weight and Subject Weight based on corpus analysis, specifically tailored for domain-specific web documents.

Findings

01

68% of generated summaries match manual summaries

02

Uses combined statistical and linguistic analysis for ranking sentences

03

Effective for domain-specific web document summarization

Abstract

Single document summarization generates summary by extracting the representative sentences from the document. In this paper, we presented a novel technique for summarization of domain-specific text from a single web document that uses statistical and linguistic analysis on the text in a reference corpus and the web document. The proposed summarizer uses the combinational function of Sentence Weight (SW) and Subject Weight (SuW) to determine the rank of a sentence, where SW is the function of number of terms (t_n) and number of words (w_n) in a sentence, and term frequency (t_f) in the corpus and SuW is the function of t_n and w_n in a subject, and t_f in the corpus. 30 percent of the ranked sentences are considered to be the summary of the web document. We generated three web document summaries using our technique and compared each of them with the summaries developed manually from 16…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.