Forming Trees with Treeformers

Nilay Patel; Jeffrey Flanigan

arXiv:2207.06960·cs.CL·July 12, 2023

Forming Trees with Treeformers

Nilay Patel, Jeffrey Flanigan

PDF

Open Access

TL;DR

This paper introduces Treeformer, an encoder module inspired by the CKY algorithm, that incorporates hierarchical structure into Transformers, significantly improving compositional generalization and performance on various NLP tasks.

Contribution

Treeformer is a novel encoder module that explicitly models hierarchical structure within Transformers, enhancing their ability to handle compositional language tasks.

Findings

01

Improves compositional generalization in NLP tasks

02

Enhances performance in machine translation and summarization

03

Demonstrates the benefits of hierarchical structure in Transformers

Abstract

Human language is known to exhibit a nested, hierarchical structure, allowing us to form complex sentences out of smaller pieces. However, many state-of-the-art neural networks models such as Transformers have no explicit hierarchical structure in its architecture -- that is, they don't have an inductive bias toward hierarchical structure. Additionally, Transformers are known to perform poorly on compositional generalization tasks which require such structures. In this paper, we introduce Treeformer, a general-purpose encoder module inspired by the CKY algorithm which learns a composition operator and pooling function to construct hierarchical encodings for phrases and sentences. Our extensive experiments demonstrate the benefits of incorporating hierarchical structure into the Transformer and show significant improvements in compositional generalization as well as in downstream tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Dense Connections · Absolute Position Encodings · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Adam