Data Classification With Multiprocessing
Anuja Dixit, Shreya Byreddy, Guanqun Song, Ting Zhu

TL;DR
This paper explores parallel training of classification algorithms using multiprocessing to reduce execution time and improve accuracy through ensembling, demonstrating benefits across several ML models.
Contribution
It introduces a parallel hyperparameter tuning and ensembling approach using Python multiprocessing for classification tasks, enhancing efficiency and prediction reliability.
Findings
Parallel training reduces execution time.
Ensembling improves classification accuracy.
Multiprocessing increases reliability of predictions.
Abstract
Classification is one of the most important tasks in Machine Learning (ML) and with recent advancements in artificial intelligence (AI) it is important to find efficient ways to implement it. Generally, the choice of classification algorithm depends on the data it is dealing with, and accuracy of the algorithm depends on the hyperparameters it is tuned with. One way is to check the accuracy of the algorithms by executing it with different hyperparameters serially and then selecting the parameters that give the highest accuracy to predict the final output. This paper proposes another way where the algorithm is parallelly trained with different hyperparameters to reduce the execution time. In the end, results from all the trained variations of the algorithms are ensembled to exploit the parallelism and improve the accuracy of prediction. Python multiprocessing is used to test this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Artificial Intelligence in Healthcare
