Injecting Hierarchy with U-Net Transformers
David Donahue, Vladislav Lialin, Anna Rumshisky

TL;DR
This paper introduces a hierarchical Transformer model inspired by U-Net architecture, which explicitly encodes hierarchical structure and improves performance in dialogue tasks.
Contribution
It presents a novel hierarchical Transformer architecture that incorporates U-Net inspired hierarchy, enhancing language modeling capabilities.
Findings
Outperforms vanilla Transformer in dialogue tasks
Demonstrates the effectiveness of hierarchical processing in NLP
Provides empirical evidence for hierarchy's role in language understanding
Abstract
The Transformer architecture has become increasingly popular over the past two years, owing to its impressive performance on a number of natural language processing (NLP) tasks. However, all Transformer computations occur at the level of word representations and therefore, it may be argued that Transformer models do not explicitly attempt to learn hierarchical structure which is widely assumed to be integral to language. In the present work, we introduce hierarchical processing into the Transformer model, taking inspiration from the U-Net architecture, popular in computer vision for its hierarchical view of natural images. We empirically demonstrate that the proposed architecture outperforms both the vanilla Transformer and some strong baselines in the domain of chit-chat dialogue.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Concatenated Skip Connection · Max Pooling · Convolution · U-Net · Residual Connection · Byte Pair Encoding · Dense Connections
