Classification of Scientific Papers With Big Data Technologies

Selen Gurbuz; Galip Aydin

arXiv:1802.05055·cs.DC·February 15, 2018

Classification of Scientific Papers With Big Data Technologies

Selen Gurbuz, Galip Aydin

PDF

TL;DR

This paper presents a cloud-based system utilizing the Naive Bayes algorithm and Apache Mahout to automatically classify Turkish scientific papers within a big data framework, demonstrating effective document categorization.

Contribution

It introduces a scalable, cloud-based classification system for Turkish scientific documents using distributed Naive Bayes and Apache Mahout, tailored for big data environments.

Findings

01

Efficient classification of Turkish scientific papers achieved

02

System demonstrates scalability on cloud infrastructure

03

Utilizes Apache Mahout for distributed processing

Abstract

Data sizes that cannot be processed by conventional data storage and analysis systems are named as Big Data.It also refers to nex technologies developed to store, process and analyze large amounts of data. Automatic information retrieval about the contents of a large number of documents produced by different sources, identifying research fields and topics, extraction of the document abstracts, or discovering patterns are some of the topics that have been studied in the field of big data.In this study, Naive Bayes classification algorithm, which is run on a data set consisting of scientific articles, has been tried to automatically determine the classes to which these documents belong. We have developed an efficient system that can analyze the Turkish scientific documents with the distributed document classification algorithm run on the Cloud Computing infrastructure. The Apache Mahout…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.