# Multimodal Abstractive Summarization for How2 Videos

**Authors:** Shruti Palaskar, Jindrich Libovick\'y, Spandana Gella, Florian, Metze

arXiv: 1906.07901 · 2019-06-20

## TL;DR

This paper introduces a multimodal approach to abstractive summarization of instructional videos, integrating video and audio transcripts using a hierarchical attention model, and proposes a new semantic-based evaluation metric.

## Contribution

It presents a novel multi-source sequence-to-sequence model with hierarchical attention for multimodal video summarization and introduces the Content F1 metric for better semantic adequacy assessment.

## Key findings

- Hierarchical attention effectively fuses video and audio modalities.
- The model generates coherent summaries from multimodal inputs.
- Content F1 correlates better with semantic quality than traditional metrics.

## Abstract

In this paper, we study abstractive summarization for open-domain videos. Unlike the traditional text news summarization, the goal is less to "compress" text information but rather to provide a fluent textual summary of information that has been collected and fused from different source modalities, in our case video and audio transcripts (or text). We show how a multi-source sequence-to-sequence model with hierarchical attention can integrate information from different modalities into a coherent output, compare various models trained with different modalities and present pilot experiments on the How2 corpus of instructional videos. We also propose a new evaluation metric (Content F1) for abstractive summarization task that measures semantic adequacy rather than fluency of the summaries, which is covered by metrics like ROUGE and BLEU.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.07901/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1906.07901/full.md

## References

53 references — full list in the complete paper: https://tomesphere.com/paper/1906.07901/full.md

---
Source: https://tomesphere.com/paper/1906.07901