# Multi-Task Video Captioning with Video and Entailment Generation

**Authors:** Ramakanth Pasunuru, Mohit Bansal

arXiv: 1704.07489 · 2017-08-09

## TL;DR

This paper introduces a multi-task learning approach for video captioning that leverages related tasks like video prediction and entailment generation, resulting in improved performance and state-of-the-art results.

## Contribution

It proposes a many-to-many multi-task model sharing parameters across video captioning, prediction, and entailment tasks, enhancing representations and performance.

## Key findings

- Achieved significant improvements on standard datasets.
- Set new state-of-the-art results.
- Mutual benefits observed between tasks.

## Abstract

Video captioning, the task of describing the content of a video, has seen some promising improvements in recent years with sequence-to-sequence models, but accurately learning the temporal and logical dynamics involved in the task still remains a challenge, especially given the lack of sufficient annotated data. We improve video captioning by sharing knowledge with two related directed-generation tasks: a temporally-directed unsupervised video prediction task to learn richer context-aware video encoder representations, and a logically-directed language entailment generation task to learn better video-entailed caption decoder representations. For this, we present a many-to-many multi-task learning model that shares parameters across the encoders and decoders of the three tasks. We achieve significant improvements and the new state-of-the-art on several standard video captioning datasets using diverse automatic and human evaluations. We also show mutual multi-task improvements on the entailment generation task.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.07489/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1704.07489/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/1704.07489/full.md

---
Source: https://tomesphere.com/paper/1704.07489