Classification of Scientific Papers With Big Data Technologies
Selen Gurbuz, Galip Aydin

TL;DR
This paper presents a cloud-based system utilizing the Naive Bayes algorithm and Apache Mahout to automatically classify Turkish scientific papers within a big data framework, demonstrating effective document categorization.
Contribution
It introduces a scalable, cloud-based classification system for Turkish scientific documents using distributed Naive Bayes and Apache Mahout, tailored for big data environments.
Findings
Efficient classification of Turkish scientific papers achieved
System demonstrates scalability on cloud infrastructure
Utilizes Apache Mahout for distributed processing
Abstract
Data sizes that cannot be processed by conventional data storage and analysis systems are named as Big Data.It also refers to nex technologies developed to store, process and analyze large amounts of data. Automatic information retrieval about the contents of a large number of documents produced by different sources, identifying research fields and topics, extraction of the document abstracts, or discovering patterns are some of the topics that have been studied in the field of big data.In this study, Naive Bayes classification algorithm, which is run on a data set consisting of scientific articles, has been tried to automatically determine the classes to which these documents belong. We have developed an efficient system that can analyze the Turkish scientific documents with the distributed document classification algorithm run on the Cloud Computing infrastructure. The Apache Mahout…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
