EGG-SR: Embedding Symbolic Equivalence into Symbolic Regression via Equality Graph
Nan Jiang, Ziyi Wang, and Yexiang Xue

TL;DR
EGG-SR introduces a novel framework that leverages symbolic equivalence via equality graphs to improve the efficiency and accuracy of symbolic regression methods, including MCTS, DRL, and LLMs, in scientific discovery tasks.
Contribution
It proposes EGG-SR, a unified approach integrating symbolic equivalence into various symbolic regression algorithms, reducing redundancy and accelerating learning.
Findings
Enhances symbolic regression models across benchmarks
Discovers more accurate expressions within same time limits
Theoretically tightens regret bounds and reduces gradient variance
Abstract
Symbolic regression seeks to uncover physical laws from experimental data by searching for closed-form expressions, which is an important task in AI-driven scientific discovery. Yet the exponential growth of the search space of expression renders the task computationally challenging. A promising yet underexplored direction for reducing the search space and accelerating training lies in *symbolic equivalence*: many expressions, although syntactically different, define the same function -- for example, , , and . Existing algorithms treat such variants as distinct outputs, leading to redundant exploration and slow learning. We introduce EGG-SR, a unified framework that integrates symbolic equivalence into a class of modern symbolic regression methods, including Monte Carlo Tree Search (MCTS), Deep Reinforcement Learning…
Peer Reviews
Decision·ICLR 2026 Poster
1. **Significance**: The paper focuses on an important and fundamental problem in symbolic regression (SR), and the introduced e-graph is reasonable for equivalence-aware SR frameworks. 2. **Multiple Backends:** The paper conducts e-graphs on several learning/search methods (MCTS, DRL, and LLM) to show a universal advantage of capturing expression equivalences in SR methods.
1. **Novelty:** Though the e-graph is new in SR, similar ideas of using graph-based representations to capture expression equality have already been studied for SR. For example, Expression DAGs [1] include the same idea of sharing sub-expressions in DAGs for expression equality. The paper lacks a sufficient literature review on this topic to differentiate its novelty and contribution. 2. **Potentially Biased UCT Search**: The proposed EGG back-propagation in MCTS might lead to biased UCT select
**Novel and Important Problem Formulation**: The paper successfully identifies and tackles the fundamental issue of symbolic equivalence, a significant source of inefficiency in SR that has been largely overlooked. This focus is timely and valuable. **High Methodological Innovation**: The primary strength is the proposal of a unified, plug-and-play framework (EGG-SR) rather than a single algorithm. Its applicability across diverse paradigms (MCTS, DRL, LLM) demonstrates remarkable generali
**Dependence on Manually Defined Rewrite Rules**: The effectiveness of EGG-SR currently relies on a pre-defined set of rewrite rules. The framework's power and generality could be further amplified by exploring methods to learn or automatically discover these rules from data, as noted by the authors in the conclusion. **Writings**: There is a significant distance between Figure 1 and the corresponding explanatory text (Section 3.1). To improve the reader's flow, I suggest moving the figure cl
- **Originality**: While e-graphs and their application to SR are not entirely new (as noted in the related works ), this paper's contribution is a novel and highly general framework. Applying the equivalence concept systematically to MCTS (search pruning), DRL (variance reduction), and LLMs (prompt enrichment) is a creative and powerful combination of ideas. The DRL variance reduction and LLM feedback mechanisms are particularly original. - **Quality**: The paper is technically strong. The the
- The process of building the e-graph, "equality saturation", involves iteratively applying all rules. While the storage is efficient (Figure 4), the construction time could become a bottleneck for very complex expressions or a very large set of rewrite rules. Figure 5 shows the overhead is negligible for one dataset, but it does not analyze how this construction time scales with expression complexity or the number of rules. - The claim for EGG-MCTS is that it "prunes redundant subtree explorat
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Topic Modeling · Artificial Intelligence in Games
