Surfer100: Generating Surveys From Web Resources, Wikipedia-style
Irene Li, Alexander Fabbri, Rina Kawamura, Yixin Liu, Xiangru Tang,, Jaesung Tae, Chang Shen, Sally Ma, Tomoe Mizutani, Dragomir Radev

TL;DR
This paper presents Surfer100, a method that combines extractive and abstractive techniques using pretrained language models to generate comprehensive Wikipedia-style surveys from web resources, addressing the challenge of rapidly evolving fields.
Contribution
It introduces a novel two-stage approach for generating long, structured summaries from web data, extending existing methods to produce detailed Wikipedia-like surveys.
Findings
Effective in generating structured summaries with sections
Struggles identified in maintaining factual accuracy and coherence
First study to utilize web resources for long Wikipedia-style summaries
Abstract
Fast-developing fields such as Artificial Intelligence (AI) often outpace the efforts of encyclopedic sources such as Wikipedia, which either do not completely cover recently-introduced topics or lack such content entirely. As a result, methods for automatically producing content are valuable tools to address this information overload. We show that recent advances in pretrained language modeling can be combined for a two-stage extractive and abstractive approach for Wikipedia lead paragraph generation. We extend this approach to generate longer Wikipedia-style summaries with sections and examine how such methods struggle in this application through detailed studies with 100 reference human-collected surveys. This is the first study on utilizing web resources for long Wikipedia-style summaries to the best of our knowledge.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWikis in Education and Collaboration · Natural Language Processing Techniques · Topic Modeling
