Pushing the limits for medical image reconstruction on recent standard multicore processors
Jan Treibig, Georg Hager, Hannes G. Hofmann, Joachim Hornegger,, Gerhard Wellein

TL;DR
This paper optimizes the backprojection algorithm for medical CT reconstruction on Intel multicore processors, achieving high performance through low-level tuning, and compares results with GPU implementations to understand hardware limitations.
Contribution
It provides detailed low-level optimizations and performance analysis of CT backprojection on multiple CPU generations, highlighting SIMD and microarchitectural issues.
Findings
Achieved state-of-the-art CPU performance for backprojection
Identified SIMD instruction set limitations on current CPUs
Provided a comprehensive comparison with GPU implementations
Abstract
Volume reconstruction by backprojection is the computational bottleneck in many interventional clinical computed tomography (CT) applications. Today vendors in this field replace special purpose hardware accelerators by standard hardware like multicore chips and GPGPUs. Medical imaging algorithms are on the verge of employing High Performance Computing (HPC) technology, and are therefore an interesting new candidate for optimization. This paper presents low-level optimizations for the backprojection algorithm, guided by a thorough performance analysis on four generations of Intel multicore processors (Harpertown, Westmere, Westmere EX, and Sandy Bridge). We choose the RabbitCT benchmark, a standardized testcase well supported in industry, to ensure transparent and comparable results. Our aim is to provide not only the fastest possible implementation but also compare to performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
