Leveraging Information Bottleneck for Scientific Document Summarization
Jiaxin Ju, Ming Liu, Huan Yee Koh, Yuan Jin, Lan Du, Shirui Pan

TL;DR
This paper introduces an unsupervised extractive method for scientific document summarization using the Information Bottleneck principle, combining signal-based retrieval and language model editing, validated by automatic and human evaluations.
Contribution
It extends the Information Bottleneck approach from sentence compression to document summarization with a two-step process and a flexible multi-view framework.
Findings
Effective summarization demonstrated on three datasets.
Summaries cover more content aspects than previous systems.
Framework is adaptable to multiple signals.
Abstract
This paper presents an unsupervised extractive approach to summarize scientific long documents based on the Information Bottleneck principle. Inspired by previous work which uses the Information Bottleneck principle for sentence compression, we extend it to document level summarization with two separate steps. In the first step, we use signal(s) as queries to retrieve the key content from the source document. Then, a pre-trained language model conducts further sentence search and edit to return the final extracted summaries. Importantly, our work can be flexibly extended to a multi-view framework by different signals. Automatic evaluation on three scientific document datasets verifies the effectiveness of the proposed framework. The further human evaluation suggests that the extracted summaries cover more content aspects than previous systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
