Mitigating Position-Shift Failures in Text-Based Modular Arithmetic via Position Curriculum and Template Diversity
Nikolay Yudin

TL;DR
This paper addresses the robustness of character-level Transformers in modular arithmetic tasks, proposing a training strategy that enhances invariance to position shifts and template variations, thereby improving out-of-distribution performance.
Contribution
The authors introduce a novel training recipe combining boundary markers, position curriculum, template diversity, and consistency training to improve model robustness against position shift and template OOD.
Findings
Significant robustness improvements to position shift and template OOD achieved.
Baseline models fail catastrophically under position shift and template OOD.
The proposed training method maintains high in-distribution accuracy while enhancing robustness.
Abstract
Building on insights from the grokking literature, we study character-level Transformers trained to compute modular addition from text, and focus on robustness under input-format variation rather than only in-distribution accuracy. We identify a previously under-emphasized failure mode: models that achieve high in-distribution accuracy can fail catastrophically when the same expression is shifted to different absolute character positions ("position shift") or presented under out-of-distribution natural-language templates. Using a disjoint-pair split over all ordered pairs for p=97, we show that a baseline model reaches strong in-distribution performance yet collapses under position shift and template OOD. We then introduce a simple training recipe that combines (i) explicit expression boundary markers, (ii) position curriculum that broadens the range of absolute positions seen during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Topic Modeling · Natural Language Processing Techniques
