TL;DR
This paper introduces an interactive tool that combines scalable data processing, machine learning, and visualization to analyze large pharmacoepidemiology datasets efficiently, enabling new insights into drug and adverse reaction patterns.
Contribution
The authors developed an integrated, open-source platform supporting scalable data analysis, machine learning, and visualization for population-scale pharmacoepidemiology data.
Findings
Preprocessed 384 million prescriptions in 2 minutes
Trained models in seconds, visualized results in milliseconds
Demonstrated effective analysis of large prescription datasets
Abstract
Population-scale drug prescription data linked with adverse drug reaction (ADR) data supports the fitting of models large enough to detect drug use and ADR patterns that are not detectable using traditional methods on smaller datasets. However, detecting ADR patterns in large datasets requires tools for scalable data processing, machine learning for data analysis, and interactive visualization. To our knowledge no existing pharmacoepidemiology tool supports all three requirements. We have therefore created a tool for interactive exploration of patterns in prescription datasets with millions of samples. We use Spark to preprocess the data for machine learning and for analyses using SQL queries. We have implemented models in Keras and the scikit-learn framework. The model results are visualized and interpreted using live Python coding in Jupyter. We apply our tool to explore a 384 million…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
