FastTrees: Parallel Latent Tree-Induction for Faster Sequence Encoding
Bill Tuck Weng Pung, Alvin Chan

TL;DR
FASTTREES introduces a parallel, non-autoregressive neural module for faster sequence encoding that induces latent tree structures, outperforming existing models on various NLP tasks and enhancing Transformer performance.
Contribution
It presents FASTTREES, a novel parallel tree induction method that improves sequence encoding speed and performance, and can be integrated into Transformer models for better results.
Findings
Achieves competitive or superior performance to ON-LSTM on four sequence tasks.
Enhances Transformer models, improving performance on three sequence transduction tasks.
Outperforms state-of-the-art models on logical inference (+4%) and mathematical language understanding (+8%).
Abstract
Inducing latent tree structures from sequential data is an emerging trend in the NLP research landscape today, largely popularized by recent methods such as Gumbel LSTM and Ordered Neurons (ON-LSTM). This paper proposes FASTTREES, a new general purpose neural module for fast sequence encoding. Unlike most previous works that consider recurrence to be necessary for tree induction, our work explores the notion of parallel tree induction, i.e., imbuing our model with hierarchical inductive biases in a parallelizable, non-autoregressive fashion. To this end, our proposed FASTTREES achieves competitive or superior performance to ON-LSTM on four well-established sequence modeling tasks, i.e., language modeling, logical inference, sentiment analysis and natural language inference. Moreover, we show that the FASTTREES module can be applied to enhance Transformer models, achieving performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Sigmoid Activation · Label Smoothing · Softmax · Residual Connection · Layer Normalization · Adam
