Efficient Learning of Discrete-Continuous Computation Graphs
David Friede, Mathias Niepert

TL;DR
This paper analyzes complex stochastic computation graphs with multiple discrete components, identifies training challenges, and proposes strategies like Gumbel noise scaling and dropout residuals to enable effective training and better generalization.
Contribution
It introduces novel training strategies for complex discrete-continuous models, overcoming gradient issues and enabling training of more intricate stochastic computation graphs.
Findings
Proposed Gumbel noise scaling improves training stability.
Dropout residual connections enhance optimization of discrete-continuous models.
Complex models outperform simpler counterparts on benchmark datasets.
Abstract
Numerous models for supervised and reinforcement learning benefit from combinations of discrete and continuous model components. End-to-end learnable discrete-continuous models are compositional, tend to generalize better, and are more interpretable. A popular approach to building discrete-continuous computation graphs is that of integrating discrete probability distributions into neural networks using stochastic softmax tricks. Prior work has mainly focused on computation graphs with a single discrete component on each of the graph's execution paths. We analyze the behavior of more complex stochastic computations graphs with multiple sequential discrete components. We show that it is challenging to optimize the parameters of these models, mainly due to small gradients and local minima. We then propose two new strategies to overcome these challenges. First, we show that increasing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Advanced Graph Neural Networks · Face and Expression Recognition
MethodsDropout · Softmax
