Sparse Optimization for Unsupervised Extractive Summarization of Long Documents with the Frank-Wolfe Algorithm
Alicia Y. Tsai, Laurent El Ghaoui

TL;DR
This paper introduces an efficient unsupervised extractive summarization method for long documents using a sparse auto-regression model solved by the Frank-Wolfe algorithm, improving results especially with paraphrased summaries.
Contribution
It proposes a novel sparse auto-regression framework for unsupervised extractive summarization and an efficient Frank-Wolfe algorithm tailored for long documents.
Findings
Achieves better ROUGE scores than existing methods.
Efficiently generates summaries with approximately k iterations.
Performs well with embedding-based semantic evaluation.
Abstract
We address the problem of unsupervised extractive document summarization, especially for long documents. We model the unsupervised problem as a sparse auto-regression one and approximate the resulting combinatorial problem via a convex, norm-constrained problem. We solve it using a dedicated Frank-Wolfe algorithm. To generate a summary with sentences, the algorithm only needs to execute iterations, making it very efficient. We explain how to avoid explicit calculation of the full gradient and how to include sentence embedding information. We evaluate our approach against two other unsupervised methods using both lexical (standard) ROUGE scores, as well as semantic (embedding-based) ones. Our method achieves better results with both datasets and works especially well when combined with embeddings for highly paraphrased summaries.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
