Improving Naive Bayes for Regression with Optimised Artificial Surrogate Data
Michael Mayo, Eibe Frank

TL;DR
This paper introduces a novel approach to improve naive Bayes regression models by generating optimized artificial training data through population-based algorithms, leading to better generalization performance.
Contribution
It presents a new method of enhancing naive Bayes regression by evolving artificial surrogate data, a novel twist on traditional training paradigms.
Findings
Artificial data improves naive Bayes regression accuracy
Population-based algorithms effectively generate surrogate data
Enhanced models outperform those trained on real data
Abstract
Can we evolve better training data for machine learning algorithms? To investigate this question we use population-based optimisation algorithms to generate artificial surrogate training data for naive Bayes for regression. We demonstrate that the generalisation performance of naive Bayes for regression models is enhanced by training them on the artificial data as opposed to the real data. These results are important for two reasons. Firstly, naive Bayes models are simple and interpretable but frequently underperform compared to more complex "black box" models, and therefore new methods of enhancing accuracy are called for. Secondly, the idea of using the real training data indirectly in the construction of the artificial training data, as opposed to directly for model training, is a novel twist on the usual machine learning paradigm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Gaussian Processes and Bayesian Inference · Data Stream Mining Techniques
