DNCs Require More Planning Steps
Yara Shamshoum, Nitzan Hodos, Yuval Sieradzki, Assaf Schuster

TL;DR
This paper investigates how the number of planning steps, or 'planning budget,' affects the generalization and efficiency of Differentiable Neural Computers in solving complex algorithmic problems, highlighting the importance of computational constraints.
Contribution
The study introduces the concept of planning budget as a key factor influencing DNC performance and demonstrates its impact across multiple algorithmic tasks.
Findings
Planning budget significantly affects generalization to larger inputs.
Limited planning steps can hinder the effective use of external memory.
Adjusting planning steps improves stability and learning efficiency.
Abstract
Many recent works use machine learning models to solve various complex algorithmic problems. However, these models attempt to reach a solution without considering the problem's required computational complexity, which can be detrimental to their ability to solve it correctly. In this work we investigate the effect of computational time and memory on generalization of implicit algorithmic solvers. To do so, we focus on the Differentiable Neural Computer (DNC), a general problem solver that also lets us reason directly about its usage of time and memory. In this work, we argue that the number of planning steps the model is allowed to take, which we call "planning budget", is a constraint that can cause the model to generalize poorly and hurt its ability to fully utilize its external memory. We evaluate our method on Graph Shortest Path, Convex Hull, Graph MinCut and Associative Recall,…
Peer Reviews
Decision·ICML 2024 Poster
The paper is, for the most part, clearly written and presents the idea in an easy-to-understand manner. The authors devote a lot of room for additional experiments that aim to investigate how adaptive processing time improves extrapolation. They also present interesting results on how extrapolation to a large amount of memory can benefit from an adaptive processing time which appears to have been a long-standing issue in DNC extensions.
The contributions of the paper concern shortcomings of the DNC, which, at least to my knowledge, has not gained a strong foothold in learning general algorithms from data. While it's refreshing to see non-LLM submissions, I'm afraid that the paper at hand might not be of high interest to the community at the moment. The paper is built around two claims (adaptive planning budget during training leads to learning more general algorithms, and adaptive planning budget allows for memory usage that g
The paper is well-written and easy to follow. It is interesting to see that the DNC models learn algorithms that can indeed generalize beyond the graph sizes seen during training (both with and without adaptive planning time). The introduced scheme to allow DNCs to leverage larger memory modules at training time by adding a temperature rescaling to the softmax over memory slots is intuitive and a nice auxiliary contribution.
The main claim of the paper is that the DNC can only learn a generalizable algorithm because it is trained with a flexible planning budget. In other words, the paper claims that one cannot learn a generalizable algorithm with a DNC trained with fixed planning budget. To prove this, the paper compares a DNC trained with a fixed planning budget of 10 steps, to a DNC trained with a flexible budget, equivalent to the size of the input. In the experiments, both DNCs are trained on input graphs with u
1. The paper is quite clear and provides adequate background on DNCs. The empirical section is well-explained and the organization of the paper is smooth. 2. The ideas put forward by the authors are intuitive. The runtime (and memory) complexity of algorithms is often a factor of the input and thus it is not surprising that neural networks that can mimic Turing machines might require something similar.
I think that the ideas are very interesting but I am afraid that the empirical evaluation might require more experiments for the authors to make the claim that adaptive planning budgets are needed. I post my questions here itself. 1) I appreciate the clarity of the plots in the paper but I feel that the total number of domains is rather limited. The authors run their experiments on only two problems and the performance differential is significant on only one of the problems (Shortest Path). Q:
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Ferroelectric and Negative Capacitance Devices
MethodsFocus
