Live Blog Corpus for Summarization
Avinesh P.V.S., Maxime Peyrard, Christian M. Meyer

TL;DR
This paper introduces a new live blog corpus for summarization, highlighting its unique challenges and providing tools for the research community to facilitate further study in automatic live blog summarization.
Contribution
The paper presents a novel live blog corpus for summarization, along with tools for corpus reconstruction and an empirical evaluation of state-of-the-art systems.
Findings
Live blog corpus presents new summarization challenges
State-of-the-art systems struggle with live blog data
Tools are provided for corpus reconstruction and research replication
Abstract
Live blogs are an increasingly popular news format to cover breaking news and live events in online journalism. Online news websites around the world are using this medium to give their readers a minute by minute update on an event. Good summaries enhance the value of the live blogs for a reader but are often not available. In this paper, we study a way of collecting corpora for automatic live blog summarization. In an empirical evaluation using well-known state-of-the-art summarization systems, we show that live blogs corpus poses new challenges in the field of summarization. We make our tools publicly available to reconstruct the corpus to encourage the research community and replicate our results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Web Data Mining and Analysis
