Middle-Out Decoding

Shikib Mehri; Leonid Sigal

arXiv:1810.11735·cs.CL·October 30, 2018

Middle-Out Decoding

Shikib Mehri, Leonid Sigal

PDF

Open Access

TL;DR

This paper introduces a middle-out decoder architecture for sequence generation that starts from a central word and expands in both directions, improving diversity and controllability in tasks like video captioning and sequence de-noising.

Contribution

The paper proposes a novel middle-out decoding method with dual self-attention, enabling bidirectional sequence expansion and enhanced control over output diversity.

Findings

01

Significant improvements in sequence de-noising accuracy

02

Competitive performance in video captioning tasks

03

Enhanced caption diversity and controllability

Abstract

Despite being virtually ubiquitous, sequence-to-sequence models are challenged by their lack of diversity and inability to be externally controlled. In this paper, we speculate that a fundamental shortcoming of sequence generation models is that the decoding is done strictly from left-to-right, meaning that outputs values generated earlier have a profound effect on those generated later. To address this issue, we propose a novel middle-out decoder architecture that begins from an initial middle-word and simultaneously expands the sequence in both directions. To facilitate information flow and maintain consistent decoding, we introduce a dual self-attention mechanism that allows us to model complex dependencies between the outputs. We illustrate the performance of our model on the task of video captioning, as well as a synthetic sequence de-noising task. Our middle-out decoder achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition