Learning Large-Scale Modular Addition with an Auxiliary Modulus
Hanato Kikuchi, Ryosuke Masuya, Kazuhiko Kawamoto, Hiroshi Kera

TL;DR
This paper introduces a covariate-shift-free training method for large-scale modular addition, enabling scalable and sample-efficient learning even with large inputs and small datasets.
Contribution
It proposes an auxiliary modulus technique that maintains input distribution consistency, improving learning performance over existing methods.
Findings
The method achieves high accuracy on large input and modulus sizes.
It outperforms sparse methods in low-data regimes.
Experiments demonstrate scalability and efficiency.
Abstract
Learning parity functions, more general modular addition, is a challenging machine learning task due to its input sensitivity. A recent study substantially scaled modular addition learning in both the number of summands and the modulus. Its key idea is to increase zeros in training sequences, reducing the effective number of summands and thus controlling training difficulty; however, this induces covariate shift between training and test input distributions. This study theoretically and empirically analyzes this side effect and proposes a covariate-shift-free method for modular addition. Specifically, we introduce an auxiliary modulus during training, which reduces wrap-around frequency and problem difficulty while preserving the same input distribution across training and testing. Experiments show strong scalability and sample efficiency: even for large input length , large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
