Sparse Optimization for Unsupervised Extractive Summarization of Long   Documents with the Frank-Wolfe Algorithm

Alicia Y. Tsai; Laurent El Ghaoui

arXiv:2208.09454·cs.CL·August 22, 2022

Sparse Optimization for Unsupervised Extractive Summarization of Long Documents with the Frank-Wolfe Algorithm

Alicia Y. Tsai, Laurent El Ghaoui

PDF

Open Access

TL;DR

This paper introduces an efficient unsupervised extractive summarization method for long documents using a sparse auto-regression model solved by the Frank-Wolfe algorithm, improving results especially with paraphrased summaries.

Contribution

It proposes a novel sparse auto-regression framework for unsupervised extractive summarization and an efficient Frank-Wolfe algorithm tailored for long documents.

Findings

01

Achieves better ROUGE scores than existing methods.

02

Efficiently generates summaries with approximately k iterations.

03

Performs well with embedding-based semantic evaluation.

Abstract

We address the problem of unsupervised extractive document summarization, especially for long documents. We model the unsupervised problem as a sparse auto-regression one and approximate the resulting combinatorial problem via a convex, norm-constrained problem. We solve it using a dedicated Frank-Wolfe algorithm. To generate a summary with $k$ sentences, the algorithm only needs to execute $\approx k$ iterations, making it very efficient. We explain how to avoid explicit calculation of the full gradient and how to include sentence embedding information. We evaluate our approach against two other unsupervised methods using both lexical (standard) ROUGE scores, as well as semantic (embedding-based) ones. Our method achieves better results with both datasets and works especially well when combined with embeddings for highly paraphrased summaries.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques