Zobrist Hash-based Duplicate Detection in Symbolic Regression

Bogdan Burlacu

arXiv:2508.13859·cs.NE·August 20, 2025

Zobrist Hash-based Duplicate Detection in Symbolic Regression

Bogdan Burlacu

PDF

TL;DR

This paper introduces a Zobrist hash-based caching mechanism to improve the efficiency of genetic programming in symbolic regression, achieving up to 34% speedups without compromising search quality.

Contribution

The paper presents a novel Zobrist hash-based caching method integrated into GP for symbolic regression, reducing redundant evaluations and enhancing computational efficiency.

Findings

01

Up to 34% speedup in symbolic regression tasks.

02

No negative impact on search quality observed.

03

Effective caching reduces redundant evaluations in GP.

Abstract

Symbolic regression encompasses a family of search algorithms that aim to discover the best fitting function for a set of data without requiring an a priori specification of the model structure. The most successful and commonly used technique for symbolic regression is Genetic Programming (GP), an evolutionary search method that evolves a population of mathematical expressions through the mechanism of natural selection. In this work we analyze the efficiency of the evolutionary search in GP and show that many points in the search space are re-visited and re-evaluated multiple times by the algorithm, leading to wasted computational effort. We address this issue by introducing a caching mechanism based on the Zobrist hash, a type of hashing frequently used in abstract board games for the efficient construction and subsequent update of transposition tables. We implement our caching…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.