Improving the Robustness of Neural Multiplication Units with Reversible Stochasticity
Bhumika Mistry, Katayoun Farrahi, Jonathon Hare

TL;DR
This paper introduces stochastic Neural Multiplication Units (sNMUs) that enhance robustness and learning reliability in neural arithmetic tasks by mitigating biases and avoiding undesirable solutions.
Contribution
The paper proposes reversible stochasticity in NMUs to improve their robustness and ability to learn simple arithmetic tasks across varying training ranges.
Findings
sNMUs outperform standard NMUs in learning multiplication across different ranges
Stochasticity improves robustness and convergence to true solutions
Enhanced representations benefit downstream numerical and image tasks
Abstract
Multilayer Perceptrons struggle to learn certain simple arithmetic tasks. Specialist neural modules for arithmetic can outperform classical architectures with gains in extrapolation, interpretability and convergence speeds, but are highly sensitive to the training range. In this paper, we show that Neural Multiplication Units (NMUs) are unable to reliably learn tasks as simple as multiplying two inputs when given different training ranges. Causes of failure are linked to inductive and input biases which encourage convergence to solutions in undesirable optima. A solution, the stochastic NMU (sNMU), is proposed to apply reversible stochasticity, encouraging avoidance of such optima whilst converging to the true solution. Empirically, we show that stochasticity provides improved robustness with the potential to improve learned representations of upstream networks for numerical and image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Model Reduction and Neural Networks
