Symbolic Regression is NP-hard
Marco Virgolin, Solon P. Pissis

TL;DR
This paper proves that symbolic regression, which aims to find interpretable mathematical models from data, is NP-hard, indicating it is computationally intractable to solve exactly in polynomial time.
Contribution
The paper provides the first formal proof that symbolic regression is NP-hard, establishing its computational difficulty beyond heuristic approaches.
Findings
SR is NP-hard, implying no known polynomial-time exact algorithms.
Heuristic methods are likely necessary due to computational hardness.
This result clarifies the theoretical limits of symbolic regression algorithms.
Abstract
Symbolic regression (SR) is the task of learning a model of data in the form of a mathematical expression. By their nature, SR models have the potential to be accurate and human-interpretable at the same time. Unfortunately, finding such models, i.e., performing SR, appears to be a computationally intensive task. Historically, SR has been tackled with heuristics such as greedy or genetic algorithms and, while some works have hinted at the possible hardness of SR, no proof has yet been given that SR is, in fact, NP-hard. This begs the question: Is there an exact polynomial-time algorithm to compute SR models? We provide evidence suggesting that the answer is probably negative by showing that SR is NP-hard.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Machine Learning and Data Classification · Metaheuristic Optimization Algorithms Research
