DynamicGate MLP Conditional Computation via Learned Structural Dropout and Input Dependent Gating for Functional Plasticity
Yong Il Choi

TL;DR
This paper introduces DynamicGate-MLP, a model that learns input-dependent gating to suppress unnecessary computation, combining regularization and conditional computation for efficient neural network inference.
Contribution
It proposes a novel framework that learns gates for each unit, enabling sample-dependent execution and efficient computation while maintaining regularization benefits.
Findings
Achieves reduced computation with maintained accuracy on multiple datasets.
Effectively controls compute budget via gate activation penalties.
Outperforms baseline MLPs and MoE variants in efficiency metrics.
Abstract
Dropout is a representative regularization technique that stochastically deactivates hidden units during training to mitigate overfitting. In contrast, standard inference executes the full network with dense computation, so its goal and mechanism differ from conditional computation, where the executed operations depend on the input. This paper organizes DynamicGate-MLP into a single framework that simultaneously satisfies both the regularization view and the conditional-computation view. Instead of a random mask, the proposed model learns gates that decide whether to use each unit (or block), suppressing unnecessary computation while implementing sample-dependent execution that concentrates computation on the parts needed for each input. To this end, we define continuous gate probabilities and, at inference time, generate a discrete execution mask from them to select an execution path.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Speech Recognition and Synthesis · Generative Adversarial Networks and Image Synthesis
