AcquisitionFocus: Joint Optimization of Acquisition Orientation and Cardiac Volume Reconstruction Using Deep Learning
Christian Weihsbach, Nora Vogt, Ziad Al-Haj Hemidi, Alexander Bigalke, Lasse Hansen, Julien Oster, Mattias P. Heinrich

TL;DR
This paper introduces a deep learning model that improves cardiac imaging by optimizing slice acquisition and reconstructing heart volumes accurately.
Contribution
The novelty lies in jointly optimizing acquisition orientation and volume reconstruction using deep learning for cardiac imaging.
Findings
The model achieves <13 mm HD95 errors in shape reconstruction.
Dice scores exceed 80%, showing high accuracy in multi-chamber reconstructions.
It performs well in both simulated and clinical cardiac MRI with various pathologies.
Abstract
In cardiac cine imaging, acquiring high-quality data is challenging and time-consuming due to the artifacts generated by the heart’s continuous movement. Volumetric, fully isotropic data acquisition with high temporal resolution is, to date, intractable due to MR physics constraints. To assess whole-heart movement under minimal acquisition time, we propose a deep learning model that reconstructs the volumetric shape of multiple cardiac chambers from a limited number of input slices while simultaneously optimizing the slice acquisition orientation for this task. We mimic the current clinical protocols for cardiac imaging and compare the shape reconstruction quality of standard clinical views and optimized views. In our experiments, we show that the jointly trained model achieves accurate high-resolution multi-chamber shape reconstruction with errors of <13 mm HD95 and Dice scores of…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6- —German Federal Ministry of Education and Research (BMBF)
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced MRI Techniques and Applications · Medical Imaging Techniques and Applications · Cardiac Imaging and Diagnostics
1. Introduction
Cardiac magnetic resonance (CMR) imaging typically follows a specific routine. Firstly, a low-resolution scout scan is acquired to localize the heart coarsely. Secondly, the scout scan is examined for manual imaging view-plane placement following dedicated protocol guidelines [1]. The scanner is then adjusted to capture the imaging planes of interest. Lastly, the acquired images are examined by clinical experts or automated post-processing software.
1.1. MR Physics Constraints and Timing
Examining images relies on sufficient image contrast, i.e., the signal-to-noise ratio (SNR). The SNR of an acquired image slice is constrained by the physical principle of MR as derived by Macovsci [2]:
where is the influence of the examined object, is the resonant frequency, is the voxel volume, and T is the acquisition time. Consequently, the SNR is affected by the imaging time and the spatiotemporal resolution of a scan. In CMR, the SNR is negatively impacted by cardiac and respiratory motion artifacts that increase with longer acquisition times [1]. Therefore, the acquisition time T acts as a lower and upper bound for the quality of the acquired cardiac images. Various sequences have thus been developed to improve the SNR and reduce the acquisition time. The SNR can be increased by combining images of the same cardiac phase when the acquisition is synchronized over multiple heart cycles [1]. This approach often requires breath-holding strategies that burden the patients [3]. In parallel imaging, the acquisition time is shortened by using multiple receiver coils that are read out in parallel [3,4,5]. From another point of view, T is proportional to the number of acquired slices and the number of acquired K-space lines , which can be captured at the rate of the repetition time TR [6]:
Equation (2) states that acquiring more slices at a higher resolution (more K-space lines) takes longer. This has been addressed with compressed sensing where only a fraction of K-space lines are captured, accelerating the acquisition by a constant factor at the cost of introducing artifacts [7]. Nevertheless, applying these techniques for high temporal resolution cine imaging may be insufficient and remains a challenge [8].
In this study, we will investigate a reduced number of imaging slices for faster acquisition without necessarily affecting the in-plane resolution or SNR that could additionally be combined with parallel imaging and/or compressed sensing. This reduction is only applicable under the regard that those sparsely acquired slices are sufficiently descriptive for clinical examination. In the cardiac domain, such a sparse stack of slices is frequently acquired along the heart’s short axis to examine the left-ventricular properties that have been proven to contain valuable information for clinical experts [9]. Descriptive imaging planes are also crucial for automated deep learning techniques, which often achieve impressive results but ultimately rely on the data input.
We hypothesize that computer-assisted techniques can benefit from tailoring the slice selection to the automated post-processing task (see Figure 1). For demonstration, we build upon a recent work that explored the challenging task of reconstructing the full cardiac shape from a set of 2D echo views [10]. For MRI, we constrain the acquisition’s field of view to two sparse slices and learn the optimal slice view orientation for accurate shape reconstruction based on coarse localizer information. The definition and selection of optimal imaging planes [1,9,11] for this task may be different from human intuition, especially when deep learning methods are involved. Despite our study being linked to MRI acquisition and (shape) reconstruction, our method is unrelated to image reconstruction from K-space signals. It operates in the image domain after applying the inverse Fourier transform.
1.2. Shape Reconstruction and Imaging Plane Optimization
Volumetric shape reconstruction has been previously explored for various medical imaging modality applications. In ultrasound imaging, there is an interest in reconstructing 3D volumes from 2D slice acquisitions of free-hand sweeps. In [12], this was solved by an LSTM model that combined sequential 2D imaging features with accelerometer parameters. Jokeit et al. [13] demonstrated that 3D bone shapes could be reconstructed from standard planar X-ray radiographs using a CycleGAN network. In a similar work, bone structures were reconstructed from sparse view segmentations using neural shape representations [14]. In the cardiac domain, left ventricle shapes were successfully reconstructed from sparse short-axis and long-axis image stacks using deformable mesh priors [15]. Stojanovsi et al. [10] performed reconstruction of the full cardiac shape from multiple slices. To overcome the lack of paired slice and 3D target data, the authors simulated US intensity images for slices that were extracted from a 3D ground-truth mesh. Their approach uses an efficient variant of the Pix2Vox model presented in [16] and will be considered for performance comparison in Section 2.6.
Optimal imaging planes have been considered in [17], where an orthopedic scanning guide for diseases in 3D ultrasound applications was developed. The method relies on a two-stream classification pipeline to predict the probe movement direction and the presence of the desired target view. In the context of MRI, a target view classification network was proposed to determine the optimal MR image slice for detecting lumbar spinal stenosis [18]. The authors selected the optimal image slice from multiple given slices and evaluated the classification outcome for several network architectures and hyperparameters. Cardiac segmentation of the left ventricle and atrium with joint prediction of standard clinical view planes has been previously explored by Chen et al. [19], who aimed to translate findings from automated segmentations into clinical routine protocols. For optimal valvular heart disease assessment, 14 slice orientations were defined using a cardiac MRI reference scan [20]. Odille et al. [21] reconstructed the left ventricular shape by fitting a b-spline model to slice segmentations obtained from motion-corrected high-resolution intensity data. They compared pre-defined configurations of 3–6 sparse slices to evaluate the impact of view plane choices on the shape reconstruction quality. To the best of our knowledge, none of the previously proposed methods studied the joint optimization of view planes and volumetric reconstruction.
1.3. Contribution
While previous studies focused on detecting clinical standard imaging planes [15,18,20], we hypothesize that the slice view orientation should be optimized in a task-driven manner and propose the following contributions:
- In a challenging target scenario, we reconstruct the full cardiac shape of five structures from only two slices.
- We study the joint optimization of shape reconstruction and view-plane orientation to derive optimal sparse slice configurations.
- The optimized slice configurations lead to superior reconstruction quality compared to standard clinical imaging planes, which we demonstrate for synthetic and clinically acquired cardiac MRI data.
2. Materials and Methods
Our pipeline mimics the MRI acquisition process (see Figure 1): From a low-resolution scout scan, a coarse anatomical shape is generated by image segmentation. We analyze this coarse segmentation to identify standard clinical view planes and optimize the image plane slicing for cardiac shape reconstruction.
2.1. Extraction of Clinical Views
Experts follow a semi-automated routine to determine cardiac view planes [22]: Firstly, the left ventricle is localized in the scout scan, then pseudo-two-chamber (2CH) and four-chamber (4CH) views are extracted. Based on these views, a stack of short-axis (SA) images is retrieved, which is a prerequisite to acquiring accurate 2CH and 4CH views. We extract the mentioned views from the coarse image segmentation by analyzing the inertial moments of the cardiac chamber shapes to construct orthonormal bases for an affine reorientation matrix ,
where m is the shape’s (voxel) mass, are the spatial indices, and x is the distance vector from the point mass to a reference point [23]. The resulting imaging planes are visualized in Figure 2.
2.2. Slicing View Optimization
As described in Figure 3, we optimize for affine matrices that maximize the reconstruction accuracy. We first generate N affine matrices to define the slicing orientation. This work explores the extreme scenario of studying only slice locations. Subsequently, we apply a reconstruction model to process the extracted slices. The deep learning architecture is laid out more specifically in Figure 4. To obtain optimizable slice orientations, we feed the segmentation of a (low-resolution) scout image scan into an acquisition model . The model comprises two operators: aligns the input optimally to yield the oriented volume . From this volume, the operator C extracts a 2D slice S per matrix :
The formulation of is inspired by Jaderberg et al. [24] and uses a spatial transformer network to sample an oriented 2D plane from a 3D volume. The network consists of a CNN localization network with learnable parameters that maps the input volume to six rotational parameters and three translational parameters with parameters, where is chosen relative to the target offset space (see Section 2.7). From , the rotational components of a 3D affine matrix are generated using the continual representation from [25]. The translational vector is formulated as:
The 3D affine matrix is then used to create a grid for the differentiable spatial transformer sampling layer. A slicing operator, C, extracts the center slice of the aligned volume. We want to stress that for every 3D input shape volume, a separate set of is predicted. This enables us to take any segmented input volume and find the correct slicing orientation for the subsequent scans using the same pre-trained model.
2.3. Reconstruction Model
For a given set of N optimized 2D image slices S from the acquisition model, we aim to reconstruct the full volumetric cardiac shape :
Aiming for a mapping , we configure the model to contain a 2D encoder and a 3D branch, where the inverse of is used at the skip connections and the bottleneck to re-embed the 2D slices in 3D space (see Figure 4 and Section 2.7).
2.4. Joint Optimization
Given the above models, we obtain N optimized slices, by jointly training the parameters of N acquisition models and one reconstruction model :
In a simplified setup, where and have the same spatial resolution, we would require for an optimal reconstruction. This mapping could be fulfilled by learning an identity function but is restricted since we feed the data through two bottlenecks that are reducing information by extracting a sparse slice and compressing the shape representation:
In our pipeline, the slice bottleneck is particularly interesting, as the reoriented slices reveal information about the importance of individual structures for the reconstruction. In an application-oriented setting, the scout scan has a lower spatial resolution than the output . When passing the predicted affine matrix to the MRI control panel, the optimized view can be captured in higher resolution to provide more detailed information for the reconstruction (see Figure 3).
2.5. Datasets
We performed initial experiments with synthetic cardiac MRI scans generated with XCAT [26] and MRXCAT 2.0 [27]. In this dataset with free-breathing protocol, each scan consists of 100 image frames with 1 spatial and 50 temporal resolution. The XCAT software provided ground-truth anatomical label maps, whereas texturized MRI simulations were derived from these maps using MRXCAT 2.0. The data were split into 24 training (male phantom) and 16 testing samples (female phantom). To show the effectiveness of our method, a percentage of of cardiac phase frames was excluded from the training set to reserve frames of the systolic phase for testing. In subsequent experiments, we used the MMWHS dataset [28] containing 20 labeled, static, nearly isotropic MRI volumes with the following structures: myocardium (MYO), left ventricle (LV), right ventricle (RV), left atrium (LA), and right atrium (RA). The dataset contains significant shape variations, including patients with cardiovascular diseases such as “cardiac function insufficiency, cardiac edema, hypertension […] arrhythmia, atrial flutter, atrial fibrillation, artery plaque, coronary atherosclerosis, aortic aneurysm, right ventricle hypertrophy [, and] dilated cardiomyopathy” [28]. The data were split into training and test data using 3-fold cross-validation.
2.6. Experimental Setup and Evaluation
Firstly, in Experiment I, we performed full cardiac shape reconstruction and compared the performance of our model to Pix2Vox (P2V, [16]) and a leaner variant Efficient Pix2Vox (EP2V, [10]), specifically designed for cardiac-slice-to-volume reconstruction (see Section 1.2). In this experiment, we simplified the multi-chamber reconstruction task to a binary shape reconstruction task to match the experimental setup of [10].
Secondly, in Experiment II, we extended the reconstruction task to multiple chambers and investigated the impact of simultaneous view-plane optimization on the reconstruction performance. We conducted an extensive ablation study transitioning from elementary to more elaborate scenarios. This transition involved replacing ground-truth annotations with automated segmentations as well as replacing high-resolution scout scans ( × × / ) with lower-resolution scout scans ( × × / )—a very coarse setting compared to the settings used in [29]. Note that these high-resolution scout scans are not available in clinical settings. Shape reconstruction was performed with just two high-resolution 2D views with × / in all scenarios, which can be acquired quickly and enables analysis with high temporal resolution.
Standard clinical views, such as 2CH and 4CH views (see Figure 2) were extracted from the scout input using the method described in Section 2.1. For the MMWHS dataset, we employed 3-fold cross-validation to address significant shape variations in the dataset. We assessed the reconstruction performance with the 95th percentile of the Hausdorff distance (HD95) and Dice score metrics.
2.7. Implementation Details
Our acquisition model is a convolutional neural network (CNN) consisting of layers with instance normalization, average pooling, and a final fully connected layer. The last layer maps the input features to six and 3 × values. The affine matrices are then constructed using the continual representation of [25] for rotational components and Equation (6) for translational components, restricting translational shifts to . The parameter count was chosen to be 40% of the spatial input volume length. In preliminary experiments, we attempted to predict the three translational components for every slice with three parameters but experienced instabilities. Mapping the parameters described in Equation (6) resulted in stable training and improved scores.
The one-hot encoded slice shape output is concatenated channel-wise (see Figure 4, center) and then fed to the reconstruction network. The reconstruction model is a U-Net based on [30], which we configure to consist of a 2D encoder and a 3D decoder by replacing the convolution and normalization layers while keeping the exact kernel sizes. To prevent the U-Net model from sharing information across slices in the encoder, we used grouped convolutions with independent groups per input slice.
The 2D features were re-embedded to the 3D space using the a grid-sampling operator with the inverse affine matrices for every slice to enable the concatenation of 2D and 3D features at the skip connections. Every block of the reconstruction model (see Figure 4) comprises two (transpose) convolutional operations, followed by instance normalization and LeakyReLU nonlinearities. During joint training, we used the AdamW optimizer [31] for the reconstruction model and a batch size of . The acquisition models were optimized using AdamW and cosine annealing scheduling with warm restarts [32]. As a loss function, we employed a combination of Dice loss and cross-entropy [30]. We found that simultaneously optimizing both slices resulted in unstable training and, therefore, followed a two-stage approach. First, the slice output of the acquisition model was duplicated and stacked across the channel dimension while optimizing the parameters of the CNN. Then, the parameters of model were fixed, and only the parameters of were optimized. In both stages, the models were trained for 80 epochs. We always performed a final reconstruction network training from scratch, where the models , , and thus the input slices , were fixed. Rotation and scaling augmentation were applied to the input and output shapes to reduce the overfitting of the reconstruction model. For image segmentation, we utilize the U-Net model pipeline of [30], trained on 2D image slices with downsampling augmentation to ensure accurate segmentations for low-resolution and high-resolution inputs.
3. Results
3.1. Experiment I
The evaluation of reconstruction model performance on the full cardiac shape is shown in Table 1 for the synthetic cine data and in Table 2 for the clinically acquired data. We observed lower Dice scores and higher HD95 errors for the MMWHS dataset, which contains largely varying pathological deformed shapes. Applied to the MRXCAT dataset, our model achieved the lowest HD95 errors in all scenarios and the best Dice score for the p2CH and p4CH slice view inputs. It thus outperformed P2V and EP2V in four of six scores. The P2V model [16] reached the best Dice score when reconstructing MRXCAT data from 2CH and SA views, whereas its efficient variant, EP2V [10], reached the best Dice value on 2CH and 4CH views (see Table 1). When applied to the MMWHS data, our model reached the highest performance in five of six scores, and was only outperformed by EP2V, which presented a lower HD95 error in the case of 2CH and SA view inputs (see Table 2).
3.2. Experiment II
We report the results of an extensive ablation study for multi-chamber shape reconstruction with our model on the synthetic MRXCAT dataset in Table 3 and the clinical MMWHS dataset in Table 4, respectively. We compared three ablation scenarios for every dataset, indicated by whitespace in the tables. The top group of values represents the first and most elementary scenario in which high-resolution scouts and ground-truth annotations were considered. The highest HD95 errors were observed for reconstructions based on the p2CH and the p4CH views typically extracted at the start of cardiac routine acquisitions ( and ).
The error was reduced to and for true 2CH and 4CH views (Figure 2). Reconstruction from 2CH + SA yielded errors of and . Randomly chosen views resulted in errors of and (RND, mean out of six runs). Optimizing the views reduced HD95 errors to a lowest of and ( and compared to true 2CH and 4CH views). An improvement could likewise be observed for the Dice scores, which improved to and % after optimization.
Figure 5 demonstrates that the highest scores were reached after the second stage of optimization (Section 2.7). In the second ablation scenario, reconstruction from realistic low-resolution scouts and ground-truth annotations was examined (see center groups of Table 3 and Table 4). We only considered the best-performing clinical 2CH + 4CH views from the first scenario for further comparison. For MRXCAT, HD95 error of 2CH + 4CH views was reduced to ( ) with optimization. While the MMWHS dataset demonstrated a comparable error reduction ( ), inferior Dice scores were observed. The last scenario added automated segmentation to the pipeline, resulting in the most application-oriented setting. For the MRXCAT data, HD95 errors increased compared to the ground-truth setting of scenario two, resulting in for 2CH + 4CH clinical views and for optimized views. This was not reflected by Dice scores, for which 2CH + 4CH clinical views outperformed the optimized views with % compared to % respectively. For the MMWHS data, the reconstruction error increased significantly to for 2CH + 4CH and for optimized views. We additionally report volumetric segmentation results for the coarse scout scans. Note that for acquiring the scout scans, 32 captured slices instead of one slice are needed at a lower in-plane resolution ( per x-, y-axis), increasing acquisition time and making it unsuitable for a direct comparison; hence, the values are enclosed in brackets.
The slicing reorientation obtained for the runs of Table 3 and Table 4 (OPT + OPT) is depicted in Figure 6. Notably, the first view was reoriented from the coronal view to an equivalent of the clinical 4CH view in the first 20 epochs, indicating that the 4CH view contains the most information for reconstruction.
Training and inference were performed on a single NVIDIA TITAN RTX 24 GB graphics card. Each stage of optimization took ∼29 . Inference took 677 for the entire pipeline to reconstruct volumes of 128 × 128 × 128 from two 128 × 128 slices. Each acquisition model contained parameters, the segmentation model contained parameters, and the reconstruction model contained parameters.
4. Discussion
We presented a novel approach to enhance the volumetric reconstruction of cardiac structures from sparse slice acquisitions using joint view-plane location and orientation optimization to overcome scan-time limitations for high-resolution 3D shape reconstructions. We tested our approach on a synthetic, dynamic cine dataset (MRXCAT) and a static dataset (MMWHS) that included significant shape variation caused by pathological deformations.
In the binary cardiac shape reconstruction experiment, our reconstruction model outperformed two related methods with lower HD95 error in five of six scenarios and higher Dice performance in four of six scenarios. Improving on the related methods, we then performed multi-chamber reconstruction and joint optimization of the input views. In an extensive ablation study, we showed that the joint optimization of slicing views could consistently reduce HD95 reconstruction errors across all six of the ablation scenarios we performed (MRXCAT: , , , MMWHS: , and , ), whereas two scenarios demonstrated a drop in Dice scores.
For the MRXCAT dataset, a promising low error rate of HD95 was achieved for multi-chamber reconstruction after view optimization, despite the fact that only a subset of cardiac phases was seen during optimization. This indicates that the reconstruction model learns a generalized shape representation. Visualizing the views of an entire test batch using the heatmap overlay (Figure 6), it is noticeable that views are reoriented consistently to yield optimal reconstruction properties (also refer to Figure 5). For the MMWHS dataset, slice optimization reduced HD95 errors in all scenarios. A significant performance drop was witnessed when slice segmentation was integrated into the pipeline. Here, the slice view segmentation model limits the capability of reconstructing the 3D shape successfully. Pre-training the segmentation model is challenging, as MMWHS data have a large shape-variability and varying contrasts. Moreover, the segmentation model must generalize to arbitrarily oriented 2D slice views that are not constrained to axial, coronal, and sagittal view planes. Training the segmentation model on a larger dataset using the identified optimized slice orientations and spatiotemporal data will certainly further enhance the model’s robustness.
5. Conclusions
We showed that five cardiac structures could be reconstructed with <13 HD95 and >80% Dice when reconstructing from only two optimized views regarding ground-truth label map inputs. In future work, we plan to investigate the quantification of possible reconstruction errors to assess the applicability of our method in clinical settings. Moreover, the reconstruction from more than two image planes and the determination of the optimal tradeoff between the reconstruction accuracy and the time needed to acquire the slices remains to be explored. The proposed image plane optimization could furthermore be applied to other target tasks, such as pathology classification. Summarizing our approach, we would like to motivate the medical deep learning community to investigate the integration of (slicing) acquisition parameters into their pipelines to improve computer-assisted analysis further.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Ismail T.F. Strugnell W. Coletti C. Božić-Iven M. Weingärtner S. Hammernik K. Correia T. Küstner T. Cardiac MR: From theory to practice Front. Cardiovasc. Med.2022913710.3389/fcvm.2022.82628335310962 PMC 8927633 · doi ↗ · pubmed ↗
- 2Macovski A. Noise in MRI Magn. Reson. Med.19963649449710.1002/mrm.19103603278875425 · doi ↗ · pubmed ↗
- 3Ridgway J.P. Cardiovascular magnetic resonance physics for clinicians: Part IJ. Cardiovasc. Magn. Reson.20101212810.1186/1532-429X-12-7121118531 PMC 3016368 · doi ↗ · pubmed ↗
- 4Pruessmann K.P. Weiger M. Scheidegger M.B. Boesiger P. SENSE: Sensitivity encoding for fast MRI Magn. Reson. Med. Off. J. Int. Soc. Magn. Reson. Med.19994295296210.1002/(SICI)1522-2594(199911)42:5<952::AID-MRM 16>3.0.CO;2-S 10542355 · doi ↗ · pubmed ↗
- 5Griswold M.A. Jakob P.M. Heidemann R.M. Nittka M. Jellus V. Wang J. Kiefer B. Haase A. Generalized autocalibrating partially parallel acquisitions (GRAPPA)Magn. Reson. Med. Off. J. Int. Soc. Magn. Reson. Med.2002471202121010.1002/mrm.1017112111967 · doi ↗ · pubmed ↗
- 6Balaban R.S. Peters D.C. Basic principles of cardiovascular magnetic resonance Cardiovascular Magnetic Resonance Elsevier Amsterdam, The Netherlands 2019114
- 7Lustig M. Donoho D. Pauly J.M. Sparse MRI: The application of compressed sensing for rapid MR imaging Magn. Reson. Med. Off. J. Int. Soc. Magn. Reson. Med.2007581182119510.1002/mrm.2139117969013 · doi ↗ · pubmed ↗
- 8Raman S.V. Markl M. Patel A.R. Bryant J. Allen B.D. Plein S. Seiberlich N. 30-minute CMR for common clinical indications:  a Society for Cardiovascular Magnetic Resonance white paper J. Cardiovasc. Magn. Reson.2022241310.1186/s 12968-022-00844-635232470 PMC 8886348 · doi ↗ · pubmed ↗
