TL;DR
LexRank introduces a graph-based eigenvector centrality method for extractive text summarization, outperforming other techniques and demonstrating robustness to data noise in large datasets.
Contribution
The paper presents LexRank, a novel eigenvector centrality approach for sentence importance in summarization, outperforming centroid-based methods and showing robustness to noise.
Findings
LexRank outperforms other methods in DUC evaluations.
Degree-based methods outperform centroid-based methods.
LexRank with threshold is the most effective degree-based technique.
Abstract
We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a document or set of documents. Salience is typically defined in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. Our system, based on LexRank ranked in first place in more than one task in the recent DUC 2004 evaluation. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
