IntSeqBERT: Learning Arithmetic Structure in OEIS via Modulo-Spectrum Embeddings
Kazuhisa Nakasho

TL;DR
IntSeqBERT is a novel Transformer-based model that encodes integer sequences using magnitude and modulo embeddings, significantly improving prediction accuracy and revealing arithmetic structures in OEIS sequences.
Contribution
The paper introduces IntSeqBERT, a dual-stream Transformer that effectively models OEIS sequences by combining magnitude and modulo embeddings, outperforming standard tokenized models.
Findings
Achieves 95.85% magnitude accuracy on OEIS sequences.
Improves next-term prediction by 7.4 times over baseline.
Reveals strong correlation between information gain and Euler's totient ratio.
Abstract
Integer sequences in the OEIS span values from single-digit constants to astronomical factorials and exponentials, making prediction challenging for standard tokenised models that cannot handle out-of-vocabulary values or exploit periodic arithmetic structure. We present IntSeqBERT, a dual-stream Transformer encoder for masked integer-sequence modelling on OEIS. Each sequence element is encoded along two complementary axes: a continuous log-scale magnitude embedding and sin/cos modulo embeddings for 100 residues (moduli --), fused via FiLM. Three prediction heads (magnitude regression, sign classification, and modulo prediction for 100 moduli) are trained jointly on 274,705 OEIS sequences. At the Large scale (91.5M parameters), IntSeqBERT achieves 95.85% magnitude accuracy and 50.38% Mean Modulo Accuracy (MMA) on the test set, outperforming a standard tokenised Transformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Time Series Analysis and Forecasting · Machine Learning in Healthcare
