Probabilistic Point Cloud Reconstructions for Vertebral Shape Analysis
Anjany Sekuboyina, Markus Rempfler, Alexander Valentinitsch,, Maximilian Loeffler, Jan S. Kirschke, and Bjoern H. Menze

TL;DR
This paper introduces a probabilistic auto-encoding network for point clouds that captures shape signatures and detects vertebral fractures as anomalies, achieving over 75% AUC without supervision or intensity features.
Contribution
It presents a novel auto-encoder architecture with a specialized loss and regularization for probabilistic shape analysis of vertebrae, enabling unsupervised fracture detection.
Findings
Achieved >75% AUC in vertebral fracture detection
Effectively models data variance on unstructured point clouds
Detects fractures as anomalies without supervision
Abstract
We propose an auto-encoding network architecture for point clouds (PC) capable of extracting shape signatures without supervision. Building on this, we (i) design a loss function capable of modelling data variance on PCs which are unstructured, and (ii) regularise the latent space as in a variational auto-encoder, both of which increase the auto-encoders' descriptive capacity while making them probabilistic. Evaluating the reconstruction quality of our architectures, we employ them for detecting vertebral fractures without any supervision. By learning to efficiently reconstruct only healthy vertebrae, fractures are detected as anomalous reconstructions. Evaluating on a dataset containing 1500 vertebrae, we achieve area-under-ROC curve of 75%, without using intensity-based features.
| recon. error | recon. log-likelihood | |||||||
|---|---|---|---|---|---|---|---|---|
| Measures | PN | PNbal | AE | VAE | -AE | -VAE | -AE | -VAE |
| 1000.0 | 68.63.4 | 57.64.1 | 61.11.9 | 67.16.5 | 68.43.3 | 62.34.3 | 61.61.4 | |
| 13.93.1 | 57.67.5 | 85.09.8 | 79.03.6 | 74.34.0 | 71.74.1 | 72.76.1 | 79.72.5 | |
| 24.74.7 | 62.55.8 | 68.0 0.9 | 68.51.7 | 67.55.1 | 69.61.2 | 66.71.3 | 69.50.6 | |
| AUC | n.a | n.a | 70.8 2.2 | 74.83.0 | 75.92.0 | 75.91.5 | 70.22.2 | 73.82.0 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Probabilistic Point Cloud Reconstructions
for Vertebral Shape Analysis ††thanks: Accepted at Medical Image Computing and Computer-Assisted Intervention 2019.
Anjany Sekuboyina1,2
1Department of Informatics, Technical University of Munich, Germany
2Department of Neuroradiology, Klinikum rechts der Isar, Germany
3Friedrich Miescher Institute for Biomedical Research, Switzerland
Markus Rempfler3
1Department of Informatics, Technical University of Munich, Germany
2Department of Neuroradiology, Klinikum rechts der Isar, Germany
3Friedrich Miescher Institute for Biomedical Research, Switzerland
Alexander Valentinitsch2
1Department of Informatics, Technical University of Munich, Germany
2Department of Neuroradiology, Klinikum rechts der Isar, Germany
3Friedrich Miescher Institute for Biomedical Research, Switzerland
Maximilian Loeffler2
1Department of Informatics, Technical University of Munich, Germany
2Department of Neuroradiology, Klinikum rechts der Isar, Germany
3Friedrich Miescher Institute for Biomedical Research, Switzerland
Jan S. Kirschke2 Joint supervising authors. 1Department of Informatics, Technical University of Munich, Germany
2Department of Neuroradiology, Klinikum rechts der Isar, Germany
3Friedrich Miescher Institute for Biomedical Research, Switzerland
Bjoern H. Menze1††footnotemark:
1Department of Informatics, Technical University of Munich, Germany
2Department of Neuroradiology, Klinikum rechts der Isar, Germany
3Friedrich Miescher Institute for Biomedical Research, Switzerland
Abstract
We propose an auto-encoding network architecture for point clouds (PC) capable of extracting shape signatures without supervision. Building on this, we (i) design a loss function capable of modelling data variance on PCs which are unstructured, and (ii) regularise the latent space as in a variational auto-encoder, both of which increase the auto-encoders’ descriptive capacity while making them probabilistic. Evaluating the reconstruction quality of our architectures, we employ them for detecting vertebral fractures without any supervision. By learning to efficiently reconstruct only healthy vertebrae, fractures are detected as anomalous reconstructions. Evaluating on a dataset containing 1500 vertebrae, we achieve area-under-ROC curve of 75%, without using intensity-based features.
1 Introduction
One of the consequences of the numerous algorithms proposed for segmenting organs, tissues, the spine etc. involves analysing their anatomical shapes, eventually contributing towards population studies [6], disease characterisation [10], survival analysis [7], etc. Employing convolutional neural networks (CNN) for this task involves processing voxelised data due to its Euclidean nature. Such voluminous representation, however, is inefficient, especially when the masks are binary and the shape information corresponds to its surface profile. Alternatively, surface meshes (a collection of vertices, edges, and faces) or active contours could be used. Since the data is no longer Euclidean, a conventional CNN is unusable. Graph convolutional networks (GCN) [3] were thus developed by redefining the notion of ‘neighbourhood’ and ‘convolution’ for meshes and graphs. However, if the number of nodes is high, GCNs (esp. spectral) become bulky. Moreover, each mesh is treated as a domain, making mesh registration a requisite.
An alternative surface representation is a set of 3D points in space, referred to as the point clouds (PC). A PC represents the surface just with a set of vertices, thus avoiding both the cubic-complexity of voxel-based representations and the dimensional, sparse, adjacency matrix of meshes. However, despite their representational effectiveness, PCs are permutation invariant and do not describe data on a structured grid, preventing the usage of standard convolution. To this end, we work with an architecture capable of processing PCs (point-net, [9]), and design a network capable of reconstructing PCs thereby extracting shape signatures in an unsupervised manner.
1.0.1 Uncertainty and latent space modelling
Unlike supervised learning on PCs [5], we set out to obtain shape signatures from PCs without supervision, building towards a relatively less explored topic of auto-encoding point clouds. This involves mapping the PC to a latent vector and reconstructing it back. Since the PCs are unordered, PC-specific reconstruction losses replace traditional ones [4, 12]. Extending auto-encoders (AE) based on such a loss, we propose to improve its representational capacity by regularising the latent space to make it compact and by modelling the variance that exists in a PC population. We claim that this results in learning improved shape signatures, validating the claim by employing the extracted features for unsupervised vertebral fracture detection.
1.0.2 Vertebral fracture detection
There exists an inherent shape variation in vertebral shapes within the spine of a single patient (e.g. cervical–thoracic–lumbar) along with a natural variation in a vertebra’s shape in a population (e.g. L1 across patients, cf. Fig. 1). Additionally, osteoporotic fractures start without significant shape change and progress into a vertebral collapse. Hence, fracture detection in vertebrae is non-trivial. Added to this, limited availability of fractured vertebrae makes the learning of supervised classifiers non-trivial. In literature, several classification systems exist mainly based on vertebral height measurement [2] or analysing sub-regions of the spine in sagittal slices [11]. However, an explicit shape-based approach seems absent. Evaluating the representational ability of the proposed AE architectures, we seek to analyse vertebral shapes and eventually detect vertebral fractures using the extracted latent shape features.
1.0.3 Our contribution
Summarising the contributions of this work: (1) We build on existing point-net-based architectures to propose a point-cloud auto encoder (AE). (2) Reinforcing this architecture, we incorporate latent space modelling and a more challenging uncertainty quantification. (3) We present a comprehensive analysis of the reconstruction capabilities of our AEs by investigating their utility in detecting vertebral fractures. We work with an in-house, clinical dataset (1500 vertebrae) achieving an area-under-curve (AUC) of 75% in detecting fractures, even without employing texture or intensity-based features.
2 Methodology
We present this section in two stages: First, we introduce the notation used in this work and describe a point-net-based architecture capable of efficiently auto-encoding point clouds. Second, we build on this architecture to model the natural variance in vertebrae while regularising the latent space.
2.1 Auto-encoding point clouds
Given accurate voxel-wise segmentation of a vertebra, a point cloud (PC) can be extracted as a set of points represented by , where represents a point by its 3D coordinate . Additionally, could also represent other point specific features such as normal, radius of curvature etc. So, each vertebra is represented by a PC of dimension (in this work, vertices and coodinates, with the vertices randomly subsampled from a higher resolution mesh). Recall the lack of a regular coordinate space associated with the point cloud and that any permutation of these points represents the same point cloud. This requires incorporation of a unique variant of deep networks for processing PCs.
Architecture. An AE consists of an encoder mapping the PC to the latent vector and a decoder reconstructing the PC back from this latent vector, i.e . As the encoder, we employ a variant of the point-net architecture [9]. The latent vector, , respects the permutation invariance of the PC and represents its shape signature. As a decoder, taking cues from [4], we construct a combination of an up-convolutional and dense branches taking as input and predicting , the reconstructed . The convolutional path, owing to its neighbourhood processing, models the ‘average’ regions, while the dense path reconstructs the finer structures. This combination of the point-net and the decoder forms our point cloud auto-encoding (AE, or interchangeably AE) architecture as illustrated in Fig. 2.
Loss. Reconstructing point clouds requires comparing the predicted PC with the actual PC to back-propogate the loss during training. However, owing to the unordered nature of PCs, usual regression losses cannot be employed. Two prominent candidates for such a task are the Chamfer distance and the Earth Mover (EM) distance [4]. We observed that minimising EM distance ignores the natural variation in shapes (e.g. the processes of the vertebrae) and reconstructs only a mean representation (e.g. the vertebral body), as validated in [4]. Since we intend to model the natural variance in the data, using EM distance is undesirable in our case. We thus employ the Chamfer distance computed as:
[TABLE]
In essence, is the distance between a point in and its nearest neighbour in and vice versa.
2.2 Probabilistic reconstruction
From a generative modelling perspective, an AE can be seen to predict the parameters of Gaussian distribution imposed on , i.e. = , parameterised by the weights of the AE denoted by . Determining the distribution parameters, viz. optimising for the AE weights, now involves maximising the log-likelihood of , resulting in:
[TABLE]
This perspective towards auto-encoding enables us to extend the AE to encompass the data variance () while modelling the latent space, as described in following sections. It is important to note that the difference is not well defined for point clouds, requiring us to opt for alternatives.
Assuming , implying an independence among the elements of and an element-wise unit variance, results in the familiar mean squared error (MSE), . Based on the parallels between MSE and the Chamfer distance (Eq. 1), we design -AE and -VAE, as illustrated in Fig 3.
2.2.1 -AE.
The assumption of unit covariance, as in AE, is inherently restrictive. However, modelling an unconstrained covariance matrix is infeasible due to quadratic complexity. A practical compromise is the independence assumption. Thus, representing covariance as, , where denotes the variance corresponding to , eq. (2) morphs to a loss function as:
[TABLE]
This optimisation models the aleoteric uncertainty [8]. Eq. 3 is an attenuated MSE, where a high variance associated to a point down-weighs its contribution to the loss. However, due to the lack of a reference grid in the point cloud space, the notion of uncertainty being associated to a data point (eg. pixel, spatial location etc.) is absent. We propose to associate the notion of variance to every point, . This results in the variance-modelling Chamfer distance:
[TABLE]
Observe the slight abuse of notation in Eq. 4, wherein the variance at a predicted point, , actually represents the variance of the coordinate elements of , i.e . Current notation is chosen to avoid clutter.
2.2.2 Variational and -Variational AE.
An alternative approach for modelling involves modelling its dependency over a latent variable , which is distributed according to a known prior . A variational auto-encoder (VAE) operates on these principles and involves maximising a lower bound on the log-evidence (referred to as ELBO) of the data described as below:
[TABLE]
where is the approximate posterior of learnt by the encoder and parameterised by . is the data likelihood modelled by the decoder and parameterised by . is the prior on .
Maximising ELBO is equivalent to maximising the log-likelihood of while minimising the Kullback-Leibler divergence between the approximate and true prior. Representing the combination as , where is the reconstruction loss seen is earlier sections. is a scaling factor weighing the contribution of the two losses appropriately. Standard practice assigns Gaussian distributions for and (cf. Fig. 3). Thus, models the latent space to follow a Gaussian distribution inline with the prior. Incorporating this into the point cloud domain, results in an objective function for a PC-based VAE (or -VAE) as . Thus, -VAE acts as a AE capable of modelling the data variance while regularising the latent space. The prior on the latent space also imparts point cloud generation capabilities to -VAE.
2.3 Detecting fractures as anomalies
Examining the descriptive ability of our AE architectures in auto-encoding PCs, we utilise them for detecting vertebral fractures. Assuming the AE is trained only on ‘normal’ patterns, a fracture can be detected as an ‘anomaly’ based on its ‘position’ in latent space. We inspect two measures for this purpose:
Reconstruction error or Chamfer distance: AEs trained on healthy samples fail to accurately reconstruct anomalous ones, resulting in a high . 2. 2.
Reconstruction probability or likelihood [1]: Expected likelihood \mathbb{E}\big{[}p_{\Theta}(X)\big{]} of an input can be computed for architectures (cf. Eq. 2). For any input PC, , it is computed by with the predicted mean and variances. We expect fractured vertebrae to be less likely than healthy ones.
Intuitively, relying on the reconstruction error or likelihood for detecting anomalies requires the learnt ‘healthy’ latent space to be representative. Both -AE and the VAE work towards this objective. In -AE, predictive variance down-weighs the loss due to highly uncertain points in the PC. This suppresses the interference due to natural variation in the vertebral PCs. On the other hand, VAE acts directly on the latent space by modelling the encoding uncertainty (). The -VAE encompasses both these features.
2.3.1 Inference.
A given vertebral PC is reconstructed and the reconstruction error and (or) likelihood are computed. This vertebra is said to be fractured if the reconstruction error is greater than a threshold, , or its likelihood is lesser than a threshold, . and and determined on the validation set.
3 Experiments & Discussion
We present this section in two parts: first, we explore the auto-encoding, variance modelling, and generative capabilities of our AE networks. Second, we deploy these architecture to detect vertebral fractures without supervision.
Data preparation: We evaluate our architecture on an in-house dataset with accurate voxel-level segmentations converted into PCs. The dataset consists of 1525 healthy and 155 fractured vertebrae, denoted as () vertebrae. Since we intend to learn the distribution of healthy vertebrae, we do not use any fractured vertebrae during training. The validation and test sets consists of () and () vertebrae, respectively. For the supervised baselines, the train set needs to contain fractured vertebrae. Thus, validation and test sets were altered to contain () and () vertebrae.
Training: The architecture of the encoder and the decoder is similar across all architectures (cf. Fig 3) except for the layers predicting variance. PCs are augmented online by perturbing the points with Gaussian noise and random rotations (). Finally, the PCs are median-centred to origin and normalised to have the same surface area. The networks are trained until convergence using an Adam optimiser with an initial learning rate of . Specific to the VAE, we use KL-annealing by increasing from 0 to 0.1.
3.0.1 Qualitative evaluation of AE architectures.
We investigate if meaningful shape features can be learnt without supervision. Validating this, in Fig. 4a, we plot a TSNE embedding of the test set latent vectors learnt by a naive AE and -VAE trained only on healthy vertebrae. Observe the clusters formed based on the vertebral index and the transition between the indices. This corresponds to the natural variation of vertebral shapes in a human spine. Indicating the fractured vertebrae in the embedding, we highlight their degree of similarity with the healthy counterparts. Also, observe that embedding is more regularised representing a Gaussian in case of -VAE, indicating the continuity of the learnt latent space. Fig. 4b shows the predictive variance modelled by the -VAE. Posterior elements of a vertebrae are the most varying among population. Observe this being captured by the variance in the vertebral process regions. Lastly, illustrating -VAE’s generative capabilities, Fig. 4c shows vertebral PC samples generated by sampling the latent vector, .
3.0.2 Vertebral fracture detection.
Evaluating the reconstruction quality of our AE architectures, we employ them to detect fractures as anomalies. As baselines, we choose two supervised approaches: (1) point-net (PN), the encoding part in our AE architectures, cast as a binary classifier and (2) the same point-net trained with median frequency balancing the classes (ref. as PNbal) to accentuate the loss from minority fractured class. We report their performance in Table 1, over 3-fold cross-validation while retaining the ratio of healthy to fractured vertebrae in the data splits. Frequency balancing improves the F1 score significantly, albeit not at the level of the proposed anomaly detection schemes.
Reconstruction for fracture detection: When detecting fractures based on reconstruction error (), we observe that a naive AE already out-performs the supervised classifiers (cf. Table 1). On top of this, we see that latent space modelling and variance modelling individually offer an improvement in F1-scores while increasing the AUC, indicating a stable detection of fractures. The performance of both -AE and -VAE is similar indicating the role of loss attenuation. However, the advantage of explicitly regularising the latent space for -VAE can be seen in likelihood-based anomaly detection, where -VAE outperforms -AE. Fig. 8 compares a reconstruction of a healthy and fractured vertebrae of the same vertebral level. Note the high reconstruction error and a low log-likelihood spatially corresponding to the deformity due to fracture.
4 Conclusions
We presented point-cloud-based auto-encoding architectures for extracting descriptive shape features. Improving their description, we incorporated variance and latent space-modelling capability using specially defined PC specific losses. The former captures the natural variance in the data while the latter regularises the latent space to be continuous. Deploying these networks for the task of unsupervised fracture detection, we achieved an AUC of 76% without using any intensity or textural features. Future work will combine the extracted shape signatures with textural features e.g. bone density and trabecular texture of vertebrae to perform fracture-grade classification.
Acknowledgements. This work is supported by the European Research Council (ERC) under the European Union’s ‘Horizon 2020’ research & innovation programme (GA637164–iBack–ERC–2014–STG). The Quadro P5000 used for this work was donated by NVIDIA Corporation.
Appendix
As supplementary content, we present: (1) A detailed description of the complete point-cloud auto-encoder, including the encoder architecture adapted from point net [1] (cf. Fig. 6), (2) additional illustrations of point-wise data uncertainty modelled by the proposed -VAE (cf. Fig. 7), and (3) Further qualitative results comparing probabilistic reconstructions of healthy and anomalous or fractured vertebrae, along with point-wise Chamfer distance and log-probability between the input and its reconstruction (cf. Fig. 8).
References
- [1]
Qi, C.R., et al.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: CVPR (2017)
- [2]
Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? In: NIPS (2017)
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Jinwon An and Sungzoon Cho. Variational autoencoder based anomaly detection using reconstruction probability. Technical report, SNU Data Mining Center, 2015.
- 2[2] Thomas Baum et al. Automatic detection of osteoporotic vertebral fractures in routine thoracic and abdominal mdct automatic detection of osteoporotic vertebral fractures in routine thoracic and abdominal mdct. Eur Radiology , 2014.
- 3[3] M M. Bronstein et al. Geometric deep learning: Going beyond euclidean data. IEEE Signal Processing Magazine , 2017.
- 4[4] Haoqiang Fan et al. A point set generation network for 3d object reconstruction from a single image. In CVPR , 2017.
- 5[5] Benjamín Gutiérrez-Becker et al. Deep multi-structural shape analysis: Application to neuroanatomy. In MICCAI . Springer, 2018.
- 6[6] Madhura Ingalhalikar et al. Sex differences in the structural connectome of the human brain. Proceedings of the National Academy of Sciences , 2014.
- 7[7] F Isensee et al. Brain tumor segmentation and radiomics survival prediction: Contribution to the brats 2017 challenge. ar Xiv e-prints , 2018.
- 8[8] Alex Kendall and Yarin Gal. What uncertainties do we need in bayesian deep learning for computer vision? In NIPS , 2017.
