Terminating Differentiable Tree Experts
Jonathan Thomm, Michael Hersche, Giacomo Camposampiero, Aleksandar, Terzi\'c, Bernhard Sch\"olkopf, Abbas Rahimi

TL;DR
This paper introduces Terminating Differentiable Tree Experts, a neuro-symbolic model that dynamically determines the number of computation steps, reducing parameter growth and enhancing flexibility in tree-based learning.
Contribution
It proposes a novel termination algorithm and a simplified architecture with a constant number of parameters, improving over previous Differentiable Tree Machine models.
Findings
Model learns to predict the number of steps without an oracle.
Maintains learning capabilities while converging to optimal steps.
Reduces parameter growth from linear to constant.
Abstract
We advance the recently proposed neuro-symbolic Differentiable Tree Machine, which learns tree operations using a combination of transformers and Tensor Product Representations. We investigate the architecture and propose two key components. We first remove a series of different transformer layers that are used in every step by introducing a mixture of experts. This results in a Differentiable Tree Experts model with a constant number of parameters for any arbitrary number of steps in the computation, compared to the previous method in the Differentiable Tree Machine with a linear growth. Given this flexibility in the number of steps, we additionally propose a new termination algorithm to provide the model the power to choose how many steps to make automatically. The resulting Terminating Differentiable Tree Experts model sluggishly learns to predict the number of steps without an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Data Mining Algorithms and Applications
