Scalable Architecture for Personalized Healthcare Service Recommendation using Big Data Lake
Sarathkumar Rangarajan, Huai Liu, Hua Wang, and Chuan-Long Wang

TL;DR
This paper introduces a scalable data lake architecture using Hadoop to efficiently integrate structured and unstructured healthcare data, enhancing personalized recommendations through improved clustering and analytics.
Contribution
It proposes a novel data lake architecture that reduces data ingestion time, unifies data storage, and improves healthcare analytics accuracy compared to traditional methods.
Findings
Reduces data ingestion time across multiple data sources
Improves patient clustering accuracy with data lake integration
Enhances personalized healthcare recommendations
Abstract
The personalized health care service utilizes the relational patient data and big data analytics to tailor the medication recommendations. However, most of the health care data are in unstructured form and it consumes a lot of time and effort to pull them into relational form. This study proposes a novel data lake architecture to reduce the data ingestion time and improve the precision of healthcare analytics. It also removes the data silos and enhances the analytics by allowing the connectivity to the third-party data providers (such as clinical lab results, chemist, insurance company,etc.). The data lake architecture uses the Hadoop Distributed File System (HDFS) to provide the storage for both structured and unstructured data. This study uses K-means clustering algorithm to find the patient clusters with similar health conditions. Subsequently, it employs a support vector machine to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare · Data Quality and Management · Privacy-Preserving Technologies in Data
