Symmetry-Constrained Language-Guided Program Synthesis for Discovering Governing Equations from Noisy and Partial Observations
Mirza Samad Ahmed Baig, Syeda Anshrah Gillani

TL;DR
SymLang is a comprehensive framework that combines symmetry constraints, language-guided program synthesis, and Bayesian model selection to accurately discover governing equations from noisy, partial data across diverse physical systems.
Contribution
It introduces a unified approach integrating symmetry constraints, language models, and Bayesian methods for robust symbolic equation discovery from complex data.
Findings
Achieves 83.7% exact recovery rate under 10% noise
Reduces extrapolation error by 61% compared to baselines
Near-eliminates conservation-law violations in discovered equations
Abstract
Discovering compact governing equations from experimental observations is one of the defining objectives of quantitative science, yet practical discovery pipelines routinely fail when measurements are noisy, relevant state variables are unobserved, or multiple symbolic structures explain the data equally well within statistical uncertainty. Here we introduce SymLang (Symmetry-constrained Language-guided equation discovery), a unified framework that brings together three previously separate ideas: (i) typed symmetry-constrained grammars that encode dimensional analysis, group-theoretic invariance, and parity constraints as hard production rules, eliminating on average 71.3% of candidate expression trees before any fitting; (ii) language-model-guided program synthesis in which a fine-tuned 7B-parameter proposer, conditioned on interpretable data descriptors, efficiently navigates the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Model Reduction and Neural Networks · Scientific Computing and Data Management
