Beyond the Training Distribution: Mapping Generalization Boundaries in Neural Program Synthesis

Henrik Voigt; Michael Habeck; Joachim Giesen

arXiv:2604.27551·cs.LG·May 1, 2026

Beyond the Training Distribution: Mapping Generalization Boundaries in Neural Program Synthesis

Henrik Voigt, Michael Habeck, Joachim Giesen

PDF

TL;DR

This paper introduces a controlled environment for evaluating neural program synthesis models, revealing their limitations in out-of-distribution generalization and emphasizing the importance of training diversity.

Contribution

It presents a novel, interpretable framework for assessing generalization boundaries in neural program synthesis using a controlled grammar-based environment.

Findings

01

Transformers struggle with syntactic extrapolation, with over 30% performance drop.

02

Diverse sampling over semantic and syntactic spaces improves out-of-distribution generalization.

03

Scaling compute yields log-linear improvements, but not enough for robust generalization.

Abstract

Large-scale transformers achieve impressive results on program synthesis benchmarks, yet their true generalization capabilities remain obscured by data contamination and opaque training corpora. To rigorously assess whether models are truly generalizing or merely retrieving memorized templates, we introduce a strictly controlled program synthesis environment based on a domain-specific arithmetic grammar. By systematically enumerating and evaluating millions of unique programs, we construct interpretable syntactic and semantic metric spaces. This allows us to precisely map data distributions and sample train and test splits that isolate specific distributional shifts. Our experiments demonstrate that optimizing density generalization -- through diverse sampling over both semantic and syntactic spaces -- induces robust out-of-distribution generalization. Conversely, evaluating support…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.