A Dynamic Head Importance Computation Mechanism for Neural Machine   Translation

Akshay Goindani; Manish Shrivastava

arXiv:2108.01377·cs.CL·August 4, 2021

A Dynamic Head Importance Computation Mechanism for Neural Machine Translation

Akshay Goindani, Manish Shrivastava

PDF

Open Access

TL;DR

This paper introduces DHICM, a dynamic mechanism to compute and utilize head importance in Transformer models for neural machine translation, improving performance especially with limited training data.

Contribution

The paper proposes a novel dynamic head importance computation mechanism that enhances Transformer efficiency and translation quality by adaptively identifying important attention heads.

Findings

01

DHICM outperforms traditional Transformer models in NMT tasks.

02

DHICM is especially effective with limited training data.

03

The added importance mechanism improves resource utilization and translation accuracy.

Abstract

Multiple parallel attention mechanisms that use multiple attention heads facilitate greater performance of the Transformer model for various applications e.g., Neural Machine Translation (NMT), text classification. In multi-head attention mechanism, different heads attend to different parts of the input. However, the limitation is that multiple heads might attend to the same part of the input, resulting in multiple heads being redundant. Thus, the model resources are under-utilized. One approach to avoid this is to prune least important heads based on certain importance score. In this work, we focus on designing a Dynamic Head Importance Computation Mechanism (DHICM) to dynamically calculate the importance of a head with respect to the input. Our insight is to design an additional attention layer together with multi-head attention, and utilize the outputs of the multi-head attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Dense Connections · Label Smoothing · Residual Connection · Adam · Byte Pair Encoding