Document Classification Using Distributed Machine Learning
Galip Aydin, Ibrahim Riza Hallac

TL;DR
This paper explores the application of distributed machine learning technologies, including Hadoop, Spark, and Mahout, to improve the performance of Naive Bayes classification for Turkish news categorization.
Contribution
It demonstrates how Apache Big Data tools can be integrated with machine learning algorithms for effective document classification in a non-English language.
Findings
Naive Bayes achieves high success rates in Turkish news classification.
Distributed technologies significantly improve processing efficiency.
The approach is scalable for large datasets.
Abstract
In this paper, we investigate the performance and success rates of Na\"ive Bayes Classification Algorithm for automatic classification of Turkish news into predetermined categories like economy, life, health etc. We use Apache Big Data technologies such as Hadoop, HDFS, Spark and Mahout, and apply these distributed technologies to Machine Learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
