Parallelization of Machine Learning Algorithms Respectively on Single Machine and Spark
Jiajun Shen

TL;DR
This paper explores the parallelization of classic machine learning algorithms on single machines and Spark, demonstrating significant improvements in runtime and efficiency for large data analysis.
Contribution
It presents a comparative study of traditional versus parallelized machine learning algorithms on single machine and Spark platform, highlighting performance gains.
Findings
Parallelized algorithms outperform traditional ones in runtime.
Significant efficiency improvements observed on Spark platform.
Parallelization benefits are consistent across different algorithms.
Abstract
With the rapid development of big data technologies, how to dig out useful information from massive data becomes an essential problem. However, using machine learning algorithms to analyze large data may be time-consuming and inefficient on the traditional single machine. To solve these problems, this paper has made some research on the parallelization of several classic machine learning algorithms respectively on the single machine and the big data platform Spark. We compare the runtime and efficiency of traditional machine learning algorithms with parallelized machine learning algorithms respectively on the single machine and Spark platform. The research results have shown significant improvement in runtime and efficiency of parallelized machine learning algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Big Data Technologies and Applications · Data Mining Algorithms and Applications
