Symbolic regression via MDLformer-guided search: from minimizing prediction error to minimizing description length
Zihan Yu, Jingtao Ding, Yong Li, Depeng Jin

TL;DR
This paper introduces a novel symbolic regression approach using MDLformer-guided search, which minimizes description length instead of prediction error, leading to more accurate formula recovery and better generalization.
Contribution
The paper proposes MDLformer for estimating description length and integrates it into a symbolic regression method, significantly improving formula recovery over existing methods.
Findings
Recovered around 50 formulas out of 133 problems, outperforming state-of-the-art by 43.92%.
Demonstrated strong generalization on 122 unseen black-box problems.
Achieved robust and scalable estimation of description length with neural network.
Abstract
Symbolic regression, a task discovering the formula best fitting the given data, is typically based on the heuristical search. These methods usually update candidate formulas to obtain new ones with lower prediction errors iteratively. However, since formulas with similar function shapes may have completely different symbolic forms, the prediction error does not decrease monotonously as the search approaches the target formula, causing the low recovery rate of existing methods. To solve this problem, we propose a novel search objective based on the minimum description length, which reflects the distance from the target and decreases monotonically as the search approaches the correct form of the target formula. To estimate the minimum description length of any input data, we design a neural network, MDLformer, which enables robust and scalable estimation through large-scale training.…
Peer Reviews
Decision·ICLR 2025 Poster
1. This paper is well-written, with the authors presenting the algorithm process and their perspectives clearly and comprehensively. 2. Figures 1 and 2 effectively illustrate the authors' viewpoints in a clear and intuitive way. 3. The proposed method demonstrates efficiency and achieves state-of-the-art performance compared to baseline methods.
1. This paper demonstrates limited innovation. The proposed method comprises three main parts: the first part provides a conventional description of neural network structure; the second generates a dataset, but the symbolic formula generation method follows that of Kamienny et al. (2022). The third part introduces two loss functions in the training process, with the first being the standard mean square error. Overall, the paper offers little originality. 2. Could larger datasets be used for expe
This paper has a very elegant approach to the problem of symbolic regression, taking a principled approach towards the search process. Despite the theoretically intractable nature of the objective function selected, in practice it seems that a transformer is able to learn it well enough to make this search strategy better than existing strategies relying on end-to-end behavior. The resulting framework is more robust to noise than existing frameworks and more generalizable to a variety of domains
Firstly, the datasets being evaluated on seem relatively small and thus potentially prone to being covered by the dataset. However, this seems to be a relatively standard convention that the related work follows, so I am not particularly concerned about this. Secondly, the example in Figure 6 seems to suggest that there are very few alternate formulas that could be selected, and yet the recovery rate of this algorithm on the Feynman dataset is below 50%. Is this example atypical or is the searc
- The overall idea of this paper is intuitive and reasonable. The traditional search objective, i.e., prediction error, cannot effectively capture the distance between the target and current formulas. - The experimental results are promising. The proposed method significantly outperforms existing methods, demonstrating that a more sophisticated objective can help improve the recovery rate.
- The evaluation dataset is limited. Unlike existing studies, this paper aims to introduce a neural network to guide the search process. As a result, it is crucial to comprehensively assess whether this model can generalize well on other tasks. However, the two datasets used in this study are relatively small. For instance, the Strogatz dataset only includes 14 problems, which is insufficient to effectively capture the performance of each method. Therefore, incorporating additional datasets [1]
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Text and Document Classification Technologies
