R2D2: Recursive Transformer based on Differentiable Tree for   Interpretable Hierarchical Language Modeling

Xiang Hu; Haitao Mi; Zujie Wen; Yafang Wang; Yi Su; Jing Zheng; Gerard; de Melo

arXiv:2107.00967·cs.CL·March 4, 2022

R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling

Xiang Hu, Haitao Mi, Zujie Wen, Yafang Wang, Yi Su, Jing Zheng, Gerard, de Melo

PDF

Open Access 1 Repo

TL;DR

This paper introduces R2D2, a recursive Transformer model that explicitly encodes hierarchical language structures using differentiable trees, improving language modeling and unsupervised parsing efficiency.

Contribution

It proposes a novel recursive Transformer architecture with differentiable trees and an efficient pruning algorithm for scalable hierarchical language modeling.

Findings

01

Effective in language modeling tasks

02

Improves unsupervised parsing accuracy

03

Scales linearly with sequence length

Abstract

Human language understanding operates at multiple levels of granularity (e.g., words, phrases, and sentences) with increasing levels of abstraction that can be hierarchically combined. However, existing deep models with stacked layers do not explicitly model any sort of hierarchical process. This paper proposes a recursive Transformer model based on differentiable CKY style binary trees to emulate the composition process. We extend the bidirectional language model pre-training objective to this architecture, attempting to predict each word given its left and right abstraction nodes. To scale up our approach, we also introduce an efficient pruned tree induction algorithm to enable encoding in just a linear number of composition steps. Experimental results on language modeling and unsupervised parsing show the effectiveness of our approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alipay/StructuredLM_RTDT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Adam · Layer Normalization · Byte Pair Encoding · Dropout · Label Smoothing