Data Augmentation for Mathematical Objects
Tereso del Rio, Matthew England

TL;DR
This paper explores data augmentation by swapping variable names in mathematical problem datasets, significantly improving machine learning model accuracy for problem classification tasks.
Contribution
It introduces a simple yet effective data augmentation technique for mathematical objects that enhances model performance and analyzes the effects of data balancing and size increase.
Findings
Data augmentation increased model accuracy by 63%.
Both data balancing and dataset size significantly improve performance.
Swapping variable names is a viable augmentation method for mathematical ML tasks.
Abstract
This paper discusses and evaluates ideas of data balancing and data augmentation in the context of mathematical objects: an important topic for both the symbolic computation and satisfiability checking communities, when they are making use of machine learning techniques to optimise their tools. We consider a dataset of non-linear polynomial problems and the problem of selecting a variable ordering for cylindrical algebraic decomposition to tackle these with. By swapping the variable names in already labelled problems, we generate new problem instances that do not require any further labelling when viewing the selection as a classification problem. We find this augmentation increases the accuracy of ML models by 63% on average. We study what part of this improvement is due to the balancing of the dataset and what is achieved thanks to further increasing the size of the dataset,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPolynomial and algebraic computation · Machine Learning and Data Classification · Machine Learning and Algorithms
