Symbolic Regression is NP-hard

Marco Virgolin; Solon P. Pissis

arXiv:2207.01018·cs.NE·July 12, 2022·27 cites

Symbolic Regression is NP-hard

Marco Virgolin, Solon P. Pissis

PDF

Open Access

TL;DR

This paper proves that symbolic regression, which aims to find interpretable mathematical models from data, is NP-hard, indicating it is computationally intractable to solve exactly in polynomial time.

Contribution

The paper provides the first formal proof that symbolic regression is NP-hard, establishing its computational difficulty beyond heuristic approaches.

Findings

01

SR is NP-hard, implying no known polynomial-time exact algorithms.

02

Heuristic methods are likely necessary due to computational hardness.

03

This result clarifies the theoretical limits of symbolic regression algorithms.

Abstract

Symbolic regression (SR) is the task of learning a model of data in the form of a mathematical expression. By their nature, SR models have the potential to be accurate and human-interpretable at the same time. Unfortunately, finding such models, i.e., performing SR, appears to be a computationally intensive task. Historically, SR has been tackled with heuristics such as greedy or genetic algorithms and, while some works have hinted at the possible hardness of SR, no proof has yet been given that SR is, in fact, NP-hard. This begs the question: Is there an exact polynomial-time algorithm to compute SR models? We provide evidence suggesting that the answer is probably negative by showing that SR is NP-hard.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Algorithms and Applications · Machine Learning and Data Classification · Metaheuristic Optimization Algorithms Research