On the Design Space Between Transformers and Recursive Neural Nets
Jishnu Ray Chowdhury, Cornelia Caragea

TL;DR
This paper explores the connection between Recursive Neural Networks and Transformers through two new models, CRvNN and NDR, demonstrating their performance and potential as a bridge in model design.
Contribution
It introduces CRvNN and NDR models that unify RvNNs and Transformers, showing their effectiveness and formalizing their relationship.
Findings
CRvNN extends RvNN to a Transformer-like structure.
NDR constrains Transformers to resemble CRvNN.
Both models outperform traditional RvNNs and Transformers on algorithmic tasks.
Abstract
In this paper, we study two classes of models, Recursive Neural Networks (RvNNs) and Transformers, and show that a tight connection between them emerges from the recent development of two recent models - Continuous Recursive Neural Networks (CRvNN) and Neural Data Routers (NDR). On one hand, CRvNN pushes the boundaries of traditional RvNN, relaxing its discrete structure-wise composition and ends up with a Transformer-like structure. On the other hand, NDR constrains the original Transformer to induce better structural inductive bias, ending up with a model that is close to CRvNN. Both models, CRvNN and NDR, show strong performance in algorithmic tasks and generalization in which simpler forms of RvNNs and Transformers fail. We explore these "bridge" models in the design space between RvNNs and Transformers, formalize their tight connections, discuss their limitations, and propose ideas…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Softmax · Label Smoothing · Linear Layer · Adam · Dropout · Layer Normalization · Dense Connections
