Collaborative Cloud Computing Framework for Health Data with Open Source Technologies
Fatemeh Rouzbeh, Ananth Grama, Paul Griffin, Mohammad Adibuzzaman

TL;DR
This paper presents a new open-source cloud computing framework tailored for health data that addresses performance, flexibility, scalability, and privacy compliance challenges in scientific research.
Contribution
It introduces a novel architecture leveraging open source tools like Hadoop, Kubernetes, and JupyterHub for health data analysis in a distributed environment.
Findings
System successfully processed 69 million patient records.
Framework improved data manipulation and query performance.
Ensured HIPAA compliance in a scalable cloud setup.
Abstract
The proliferation of sensor technologies and advancements in data collection methods have enabled the accumulation of very large amounts of data. Increasingly, these datasets are considered for scientific research. However, the design of the system architecture to achieve high performance in terms of parallelization, query processing time, aggregation of heterogeneous data types (e.g., time series, images, structured data, among others), and difficulty in reproducing scientific research remain a major challenge. This is specifically true for health sciences research, where the systems must be i) easy to use with the flexibility to manipulate data at the most granular level, ii) agnostic of programming language kernel, iii) scalable, and iv) compliant with the HIPAA privacy law. In this paper, we review the existing literature for such big data systems for scientific research in health…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
