A Data Colocation Grid Framework for Big Data Medical Image Processing - Backend Design
Shunxing Bao, Yuankai Huo, Prasanna Parvathaneni, Andrew J. Plassard,, Camilo Bermudez, Yuang Yao, Ilwoo Llyu, Aniruddha Gokhale, Bennett A. Landman

TL;DR
This paper introduces a backend API and data management scheme for a medical image processing grid framework, improving performance, query speed, and analysis efficiency on heterogeneous clusters using Hadoop and HBase.
Contribution
It presents a novel backend API design, a dataset summary model, and an optimized HBase table scheme tailored for big data medical image processing in heterogeneous environments.
Findings
Load balancer improved wall-time by 1.5x.
Summary statistic model reduced processing time 8-fold.
HBase table scheme achieved 7-fold faster queries.
Abstract
When processing large medical imaging studies, adopting high performance grid computing resources rapidly becomes important. We recently presented a "medical image processing-as-a-service" grid framework that offers promise in utilizing the Apache Hadoop ecosystem and HBase for data colocation by moving computation close to medical image storage. However, the framework has not yet proven to be easy to use in a heterogeneous hardware environment. Furthermore, the system has not yet validated when considering variety of multi-level analysis in medical imaging. Our target criteria are (1) improving the framework's performance in a heterogeneous cluster, (2) performing population based summary statistics on large datasets, and (3) introducing a table design scheme for rapid NoSQL query. In this paper, we present a backend interface application program interface design for Hadoop & HBase for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
