High-Performance Mining of COVID-19 Open Research Datasets for Text Classification and Insights in Cloud Computing Environments
Jie Zhao, Maria A. Rodriguez, Rajkumar Buyya

TL;DR
This paper presents a hybrid cloud system utilizing Aneka PaaS middleware to efficiently process and categorize COVID-19 research articles, significantly reducing processing time and enabling scalable scholarly data analysis.
Contribution
It introduces a novel hybrid cloud framework with parallel processing for large-scale COVID-19 literature analysis using machine learning.
Findings
Reduced processing time for large datasets
Achieved linear scalability in performance
Effective categorization of COVID-19 research articles
Abstract
COVID-19 global pandemic is an unprecedented health crisis. Since the outbreak, many researchers around the world have produced an extensive collection of literatures. For the research community and the general public to digest, it is crucial to analyse the text and provide insights in a timely manner, which requires a considerable amount of computational power. Clouding computing has been widely adopted in academia and industry in recent years. In particular, hybrid cloud is gaining popularity since its two-fold benefits: utilising existing resource to save cost and using additional cloud service providers to gain assess to extra computing resources on demand. In this paper, we developed a system utilising the Aneka PaaS middleware with parallel processing and multi-cloud capability to accelerate the ETL and article categorising process using machine learning technology on a hybrid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Business Intelligence · Cloud Computing and Resource Management · Data Stream Mining Techniques
