Copy this Sentence

Vasileios Lioutas; Andriy Drozdyuk

arXiv:1905.09856·cs.LG·May 27, 2019·1 cites

Copy this Sentence

Vasileios Lioutas, Andriy Drozdyuk

PDF

Open Access

TL;DR

This paper formally defines the attention operation, explores its application to sequence-to-sequence models, and demonstrates that greater use of attention improves performance, convergence speed, and stability on copying tasks.

Contribution

It provides a rigorous mathematical definition of attention and links it to practical implementations, highlighting its benefits in sequence-to-sequence learning.

Findings

01

Models with more attention perform better on copying tasks.

02

Attention-based models converge faster.

03

Attention improves model stability.

Abstract

Attention is an operation that selects some largest element from some set, where the notion of largest is defined elsewhere. Applying this operation to sequence to sequence mapping results in significant improvements to the task at hand. In this paper we provide the mathematical definition of attention and examine its application to sequence to sequence models. We highlight the exact correspondences between machine learning implementations of attention and our mathematical definition. We provide clear evidence of effectiveness of attention mechanisms evaluating models with varying degrees of attention on a very simple task: copying a sentence. We find that models that make greater use of attention perform much better on sequence to sequence mapping tasks, converge faster and are more stable.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education