High-Performance Mining of COVID-19 Open Research Datasets for Text   Classification and Insights in Cloud Computing Environments

Jie Zhao; Maria A. Rodriguez; Rajkumar Buyya

arXiv:2009.07399·cs.DC·September 17, 2020·1 cites

High-Performance Mining of COVID-19 Open Research Datasets for Text Classification and Insights in Cloud Computing Environments

Jie Zhao, Maria A. Rodriguez, Rajkumar Buyya

PDF

Open Access

TL;DR

This paper presents a hybrid cloud system utilizing Aneka PaaS middleware to efficiently process and categorize COVID-19 research articles, significantly reducing processing time and enabling scalable scholarly data analysis.

Contribution

It introduces a novel hybrid cloud framework with parallel processing for large-scale COVID-19 literature analysis using machine learning.

Findings

01

Reduced processing time for large datasets

02

Achieved linear scalability in performance

03

Effective categorization of COVID-19 research articles

Abstract

COVID-19 global pandemic is an unprecedented health crisis. Since the outbreak, many researchers around the world have produced an extensive collection of literatures. For the research community and the general public to digest, it is crucial to analyse the text and provide insights in a timely manner, which requires a considerable amount of computational power. Clouding computing has been widely adopted in academia and industry in recent years. In particular, hybrid cloud is gaining popularity since its two-fold benefits: utilising existing resource to save cost and using additional cloud service providers to gain assess to extra computing resources on demand. In this paper, we developed a system utilising the Aneka PaaS middleware with parallel processing and multi-cloud capability to accelerate the ETL and article categorising process using machine learning technology on a hybrid…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Business Intelligence · Cloud Computing and Resource Management · Data Stream Mining Techniques