Integrating Spatial Configuration into Heatmap Regression Based CNNs for   Landmark Localization

Christian Payer; Darko \v{S}tern; Horst Bischof; Martin Urschler

arXiv:1908.00748·eess.IV·August 5, 2019

Integrating Spatial Configuration into Heatmap Regression Based CNNs for Landmark Localization

Christian Payer, Darko \v{S}tern, Horst Bischof, Martin Urschler

PDF

3 Repos

TL;DR

This paper introduces a novel CNN architecture called SpatialConfiguration-Net (SCN) that enhances landmark localization accuracy in medical images by integrating spatial configuration information, especially effective with limited training data.

Contribution

The paper presents a new CNN design that splits the localization task into simpler sub-problems, improving robustness and accuracy in landmark detection with small datasets.

Findings

01

SCN outperforms existing methods in landmark localization error.

02

Incorporating spatial configuration improves robustness to ambiguities.

03

Effective on size-limited datasets.

Abstract

In many medical image analysis applications, often only a limited amount of training data is available, which makes training of convolutional neural networks (CNNs) challenging. In this work on anatomical landmark localization, we propose a CNN architecture that learns to split the localization task into two simpler sub-problems, reducing the need for large training datasets. Our fully convolutional SpatialConfiguration-Net (SCN) dedicates one component to locally accurate but ambiguous candidate predictions, while the other component improves robustness to ambiguities by incorporating the spatial configuration of landmarks. In our experimental evaluation, we show that the proposed SCN outperforms related methods in terms of landmark localization error on size-limited datasets.

Equations2

h_{i}(\vec{x})=h^{\text}{LA}_{i}(\vec{x})\odot h^{\text}{SC}_{i}(\vec{x}).

h_{i}(\vec{x})=h^{\text}{LA}_{i}(\vec{x})\odot h^{\text}{SC}_{i}(\vec{x}).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\jmlrproceedings

MIDLMedical Imaging with Deep Learning \jmlrpages \jmlryear2019 \jmlrworkshopMIDL 2019 – Extended Abstract Track

\midlauthor\NameChristian Payer\nametag1,2 \[email protected]

\addr1 Institute of Computer Graphics and Vision, Graz University of Technology, Graz, Austria

\addr2 Ludwig Boltzmann Institute for Clinical Forensic Imaging, Graz, Austria and \NameDarko Štern\nametag2 \[email protected]

\NameHorst Bischof\nametag1 \[email protected]

\NameMartin Urschler\nametag2,1 \[email protected]

Integrating Spatial Configuration into Heatmap Regression Based CNNs for Landmark Localization

Abstract

In many medical image analysis applications, often only a limited amount of training data is available, which makes training of convolutional neural networks (CNNs) challenging. In this work on anatomical landmark localization, we propose a CNN architecture that learns to split the localization task into two simpler sub-problems, reducing the need for large training datasets. Our fully convolutional SpatialConfiguration-Net (SCN) dedicates one component to locally accurate but ambiguous candidate predictions, while the other component improves robustness to ambiguities by incorporating the spatial configuration of landmarks. In our experimental evaluation, we show that the proposed SCN outperforms related methods in terms of landmark localization error on size-limited datasets.

keywords:

anatomical landmarks, localization, heatmap regression, spatial configuration

1 Introduction

Localization of anatomical landmarks is an important step in medical image analysis, e.g., in segmentation [Beichel et al. (2005)], or registration [Johnson and Christensen (2002)]. Unfortunately, locally similar structures often introduce difficulties due to ambiguity into landmark localization. To deal with these difficulties, machine learning based approaches often combine local landmark predictions with explicit handcrafted graphical models, aiming to restrict predictions to feasible spatial configurations. Thus, the landmark localization problem is simplified by separating the task into two successive steps. The first step is dedicated to locally accurate but potentially ambiguous predictions, while in the second step graphical models [Cootes et al. (1995), Felzenszwalb and Huttenlocher (2005)] eliminate ambiguities.

Recent advances in computer vision and medical imaging have mainly been driven by convolutional neural networks (CNNs) due to their superior capabilities to automatically learn important image features [LeCun et al. (2015)]. Unfortunately, CNNs typically need large amounts of training data. Especially in medical imaging, this requirement is hard to fulfill, due to ethical and financial concerns as well as time consuming expert annotations.

In this work, we show that the amount of required training data can be reduced with our proposed two-component SpatialConfiguration-Net (SCN), which follows the idea of handcrafted graphical models to split landmark localization into two successive steps. This extended abstract gives a short overview of the key concepts of our journal paper published in [Payer et al. (2019)], while we refer the reader to the full paper for more detailed descriptions and more extensive evaluations on a variety of datasets.

2 Method

Our method for landmark localization is based on regressing heatmap images (Tompson et al., 2014), which encode the pseudo-probability of a landmark being located at a certain pixel position. With $N$ being the total number of landmarks, we define the target heatmap image of a landmark $L_{i}$ , $i=\{1,...,N\}$ as the $d$ -dimensional Gaussian function ${g_{i}(\vec{x}):\mathbb{R}^{d}\rightarrow\mathbb{R}}$ centered at the target landmark’s groundtruth coordinate ${\vec{\overset{\ast}{x}}_{i}\in\mathbb{R}^{d}}$ .

The network is set up to regress $N$ heatmaps simultaneously by minimizing the differences between predicted heatmaps $h_{i}(\vec{x})$ and the corresponding target heatmaps $g_{i}(\vec{x})$ in an end-to-end manner Ronneberger et al. (2015); Shelhamer et al. (2017). In network inference, we obtain the predicted coordinate $\vec{\hat{x}}_{i}\in\mathbb{R}^{d}$ of each landmark $L_{i}$ by taking the coordinate, where the heatmap has its highest value.

2.1 SpatialConfiguration-Net

The fundamental concept of the SpatialConfiguration-Net (SCN) is the interaction between its two components (see Fig. 1). The first component takes the image as input to generate locally accurate but potentially ambiguous local appearance heatmaps $h^{\text}{LA}_{i}(\vec{x})$ . Motivated by handcrafted graphical models for eliminating these potential ambiguities, the second component takes the predicted candidate heatmaps $h^{\text}{LA}_{i}(\vec{x})$ as input to generate inaccurate but unambiguous spatial configuration heatmaps $h^{\text}{SC}_{i}(\vec{x})$ .

For $N$ landmarks, the set of predicted heatmaps $\mathbb{H}=\{h_{i}(\vec{x})\;|\;i=1\dots N\}$ is obtained by element-wise multiplication $\odot$ of the corresponding heatmap outputs $h^{\text}{LA}_{i}(\vec{x})$ and $h^{\text}{SC}_{i}(\vec{x})$ of the two components:

[TABLE]

This multiplication is crucial for the SCN, as it forces both of its components to generate a response on the location of the target landmark $\vec{\overset{\ast}{x}}_{i}$ , i.e., both $h^{\text}{LA}_{i}(\vec{x})$ and $h^{\text}{SC}_{i}(\vec{x})$ deliver responses for $\vec{x}$ close to $\vec{\overset{\ast}{x}}_{i}$ , while on all other locations one component may have a response as long as the other one does not have one.

3 Experiments and Results

We evaluate our proposed SCN on a dataset of 895 radiographs of left hands with 37 annotated characteristic landmarks on finger tips and bone joints. We compare our SCN to state-of-the-art random regression forests Ebner et al. (2014); Lindner et al. (2015); Štern et al. (2016); Urschler et al. (2018), our previous CNN-based method of Payer et al. (2016), and our implementation of a localization U-Net for heatmap regression. Results of the image-specific point-to-point errors for three-fold cross validation of the 895 radiographs are shown in Fig. 2. When using all training images, our SCN outperforms all other compared methods. Additionally, when drastically reducing the number of training images to 100, 50, and 10, respectively, our SCN greatly outperforms the localization U-Net. This confirms that splitting the localization task into predicting accurate but potentially ambiguous local appearance heatmaps and inaccurate but unambiguous spatial configuration heatmaps is especially useful when dealing with only limited amounts of training data.

4 Conclusion

In conclusion, we have shown how to combine information of local appearance and spatial configuration into a single end-to-end trained network for landmark localization. Our generic architecture achieves state-of-the-art results in terms of localization error, even when only limited amounts of training images are available. We are currently looking into extending our SCN regarding occluded structures and multi-object localization, and into adapting our SCN for semantic segmentation problems (see Payer et al. (2018) for preliminary results), where structural constraints may be used in a similar manner.

Bibliography15

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Beichel et al. (2005) Reinhard Beichel, Horst Bischof, Franz Leberl, and Milan Sonka. Robust Active Appearance Models and Their Application to Medical Image Analysis. IEEE Trans. Med. Imaging , 24(9):1151–1169, sep 2005. 10.1109/TMI.2005.853237 . · doi ↗
2Cootes et al. (1995) Tim F. Cootes, Christopher J. Taylor, David H. Cooper, and Jim Graham. Active Shape Models-Their Training and Application. Comput. Vis. Image Underst. , 61(1):38–59, jan 1995. 10.1006/cviu.1995.1004 . · doi ↗
3Ebner et al. (2014) Thomas Ebner, Darko Štern, René Donner, Horst Bischof, and Martin Urschler. Towards Automatic Bone Age Estimation from MRI: Localization of 3D Anatomical Landmarks. In Proc. Med. Image Comput. Comput. Interv. , pages 421–428. Springer, 2014. 10.1007/978-3-319-10470-6_53 . · doi ↗
4Felzenszwalb and Huttenlocher (2005) Pedro F. Felzenszwalb and Daniel P. Huttenlocher. Pictorial Structures for Object Recognition. Int. J. Comput. Vis. , 61(1):55–79, 2005. 10.1023/B:VISI.0000042934.15159.49 . · doi ↗
5Johnson and Christensen (2002) Hans J. Johnson and Gary E. Christensen. Consistent Landmark and Intensity-Based Image Registration. IEEE Trans. Med. Imaging , 21(5):450–461, 2002. 10.1109/TMI.2002.1009381 . · doi ↗
6Le Cun et al. (2015) Yann Le Cun, Yoshua Bengio, and Geoffrey Hinton. Deep Learning. Nature , 521(7553):436–444, 2015. 10.1038/nature 14539 . · doi ↗
7Lindner et al. (2015) Claudia Lindner, Paul A. Bromiley, Mircea C. Ionita, and Tim F. Cootes. Robust and Accurate Shape Model Matching Using Random Forest Regression-Voting. IEEE Trans. Pattern Anal. Mach. Intell. , 37(9):1862–1874, sep 2015. 10.1109/TPAMI.2014.2382106 . · doi ↗
8Payer et al. (2016) Christian Payer, Darko Štern, Horst Bischof, and Martin Urschler. Regressing Heatmaps for Multiple Landmark Localization Using CN Ns. In Proc. Med. Image Comput. Comput. Interv. , pages 230–238. Springer, 2016. 10.1007/978-3-319-46723-8_27 . · doi ↗