Distilling Wikipedia mathematical knowledge into neural network models
Joanne T. Kim, Mikel Landajuela, Brenden K. Petersen

TL;DR
This paper introduces a pipeline to extract and encode mathematical expressions from Wikipedia, creating a valuable resource for training machine learning models in symbolic mathematics, and demonstrates its effectiveness in symbolic regression tasks.
Contribution
It presents a novel method for distilling Wikipedia's mathematical content into symbolic encodings, enabling improved neural-guided symbolic regression.
Findings
Enhanced performance in symbolic regression tasks
Effective encoding of Wikipedia's mathematical expressions
A new resource for symbolic mathematics machine learning
Abstract
Machine learning applications to symbolic mathematics are becoming increasingly popular, yet there lacks a centralized source of real-world symbolic expressions to be used as training data. In contrast, the field of natural language processing leverages resources like Wikipedia that provide enormous amounts of real-world textual data. Adopting the philosophy of "mathematics as language," we bridge this gap by introducing a pipeline for distilling mathematical expressions embedded in Wikipedia into symbolic encodings to be used in downstream machine learning tasks. We demonstrate that a trained on this "corpus" of expressions can be used as a prior to improve the performance of neural-guided search for the task of symbolic regression.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Model Reduction and Neural Networks · Artificial Intelligence in Games
