TL;DR
This paper introduces a highly optimized multi-node multi-GPU solver for large-scale diffeomorphic image registration, achieving significant speedups and enabling the registration of extremely large images in seconds.
Contribution
It presents a novel preconditioner, an optimized multi-GPU implementation with direct device communication, and demonstrates state-of-the-art performance on large-scale problems.
Findings
Solved 256^3 images in 5 seconds on a single GPU.
Registered 2048^3 images using 64 nodes and 256 GPUs.
Achieved 70% performance speedup over existing methods.
Abstract
We present a Gauss-Newton-Krylov solver for large deformation diffeomorphic image registration. We extend the publicly available CLAIRE library to multi-node multi-graphics processing unit (GPUs) systems and introduce novel algorithmic modifications that significantly improve performance. Our contributions comprise () a new preconditioner for the reduced-space Gauss-Newton Hessian system, () a highly-optimized multi-node multi-GPU implementation exploiting device direct communication for the main computational kernels (interpolation, high-order finite difference operators and Fast-Fourier-Transform), and () a comparison with state-of-the-art CPU and GPU implementations. We solve a -resolution image registration problem in five seconds on a single NVIDIA Tesla V100, with a performance speedup of 70% compared to the state-of-the-art. In our largest run, we register…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
