Injecting Hierarchy with U-Net Transformers

David Donahue; Vladislav Lialin; Anna Rumshisky

arXiv:1910.10488·cs.LG·April 5, 2021

Injecting Hierarchy with U-Net Transformers

David Donahue, Vladislav Lialin, Anna Rumshisky

PDF

Open Access 2 Repos

TL;DR

This paper introduces a hierarchical Transformer model inspired by U-Net architecture, which explicitly encodes hierarchical structure and improves performance in dialogue tasks.

Contribution

It presents a novel hierarchical Transformer architecture that incorporates U-Net inspired hierarchy, enhancing language modeling capabilities.

Findings

01

Outperforms vanilla Transformer in dialogue tasks

02

Demonstrates the effectiveness of hierarchical processing in NLP

03

Provides empirical evidence for hierarchy's role in language understanding

Abstract

The Transformer architecture has become increasingly popular over the past two years, owing to its impressive performance on a number of natural language processing (NLP) tasks. However, all Transformer computations occur at the level of word representations and therefore, it may be argued that Transformer models do not explicitly attempt to learn hierarchical structure which is widely assumed to be integral to language. In the present work, we introduce hierarchical processing into the Transformer model, taking inspiration from the U-Net architecture, popular in computer vision for its hierarchical view of natural images. We empirically demonstrate that the proposed architecture outperforms both the vanilla Transformer and some strong baselines in the domain of chit-chat dialogue.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Concatenated Skip Connection · Max Pooling · Convolution · U-Net · Residual Connection · Byte Pair Encoding · Dense Connections