# TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper   Summarization Based on Conference Talks

**Authors:** Guy Lev, Michal Shmueli-Scheuer, Jonathan Herzig, Achiya Jerbi, David, Konopnicki

arXiv: 1906.01351 · 2019-06-14

## TL;DR

This paper introduces TalkSumm, a large-scale dataset for scientific paper summarization created from conference talk videos, and proposes a scalable method to generate summaries automatically, achieving comparable performance to manual summaries.

## Contribution

The paper presents a novel automatic annotation method using conference talk videos to generate scientific paper summaries, enabling scalable dataset creation.

## Key findings

- Model trained on TalkSumm performs comparably to manually annotated datasets.
- Human experts validate the quality of the automatically generated summaries.
- The dataset facilitates future research in scientific paper summarization.

## Abstract

Currently, no large-scale training data is available for the task of scientific paper summarization. In this paper, we propose a novel method that automatically generates summaries for scientific papers, by utilizing videos of talks at scientific conferences. We hypothesize that such talks constitute a coherent and concise description of the papers' content, and can form the basis for good summaries. We collected 1716 papers and their corresponding videos, and created a dataset of paper summaries. A model trained on this dataset achieves similar performance as models trained on a dataset of summaries created manually. In addition, we validated the quality of our summaries by human experts.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.01351/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1906.01351/full.md

---
Source: https://tomesphere.com/paper/1906.01351