Improving Naive Bayes for Regression with Optimised Artificial Surrogate   Data

Michael Mayo; Eibe Frank

arXiv:1707.04943·cs.AI·November 29, 2018·2 cites

Improving Naive Bayes for Regression with Optimised Artificial Surrogate Data

Michael Mayo, Eibe Frank

PDF

Open Access

TL;DR

This paper introduces a novel approach to improve naive Bayes regression models by generating optimized artificial training data through population-based algorithms, leading to better generalization performance.

Contribution

It presents a new method of enhancing naive Bayes regression by evolving artificial surrogate data, a novel twist on traditional training paradigms.

Findings

01

Artificial data improves naive Bayes regression accuracy

02

Population-based algorithms effectively generate surrogate data

03

Enhanced models outperform those trained on real data

Abstract

Can we evolve better training data for machine learning algorithms? To investigate this question we use population-based optimisation algorithms to generate artificial surrogate training data for naive Bayes for regression. We demonstrate that the generalisation performance of naive Bayes for regression models is enhanced by training them on the artificial data as opposed to the real data. These results are important for two reasons. Firstly, naive Bayes models are simple and interpretable but frequently underperform compared to more complex "black box" models, and therefore new methods of enhancing accuracy are called for. Secondly, the idea of using the real training data indirectly in the construction of the artificial training data, as opposed to directly for model training, is a novel twist on the usual machine learning paradigm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Gaussian Processes and Bayesian Inference · Data Stream Mining Techniques