Learning Large-Scale Modular Addition with an Auxiliary Modulus

Hanato Kikuchi; Ryosuke Masuya; Kazuhiko Kawamoto; Hiroshi Kera

arXiv:2605.07648·cs.LG·May 11, 2026

Learning Large-Scale Modular Addition with an Auxiliary Modulus

Hanato Kikuchi, Ryosuke Masuya, Kazuhiko Kawamoto, Hiroshi Kera

PDF

TL;DR

This paper introduces a covariate-shift-free training method for large-scale modular addition, enabling scalable and sample-efficient learning even with large inputs and small datasets.

Contribution

It proposes an auxiliary modulus technique that maintains input distribution consistency, improving learning performance over existing methods.

Findings

01

The method achieves high accuracy on large input and modulus sizes.

02

It outperforms sparse methods in low-data regimes.

03

Experiments demonstrate scalability and efficiency.

Abstract

Learning parity functions, more general modular addition, is a challenging machine learning task due to its input sensitivity. A recent study substantially scaled modular addition learning in both the number of summands and the modulus. Its key idea is to increase zeros in training sequences, reducing the effective number of summands and thus controlling training difficulty; however, this induces covariate shift between training and test input distributions. This study theoretically and empirically analyzes this side effect and proposes a covariate-shift-free method for modular addition. Specifically, we introduce an auxiliary modulus $K q$ during training, which reduces wrap-around frequency and problem difficulty while preserving the same input distribution across training and testing. Experiments show strong scalability and sample efficiency: even for large input length $N$ , large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.