Routing Networks and the Challenges of Modular and Compositional Computation
Clemens Rosenbaum, Ignacio Cases, Matthew Riemer, Tim Klinger

TL;DR
This paper investigates routing networks, a method for modular and compositional computation, analyzing the challenges in training such models and how different design choices impact their performance across various tasks.
Contribution
It provides an empirical analysis of the challenges in training routing networks and explores how design decisions affect their effectiveness in compositional learning.
Findings
Routing decisions significantly influence model performance.
Regularization impacts the training stability of routing networks.
Design choices affect the ability to learn effective module compositions.
Abstract
Compositionality is a key strategy for addressing combinatorial complexity and the curse of dimensionality. Recent work has shown that compositional solutions can be learned and offer substantial gains across a variety of domains, including multi-task learning, language modeling, visual question answering, machine comprehension, and others. However, such models present unique challenges during training when both the module parameters and their composition must be learned jointly. In this paper, we identify several of these issues and analyze their underlying causes. Our discussion focuses on routing networks, a general approach to this problem, and examines empirically the interplay of these challenges and a variety of design decisions. In particular, we consider the effect of how the algorithm decides on module composition, how the algorithm updates the modules, and if the algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Ferroelectric and Negative Capacitance Devices · Domain Adaptation and Few-Shot Learning
