# Spatial Transformer for 3D Point Clouds

**Authors:** Jiayun Wang, Rudrasis Chakraborty, Stella X. Yu

arXiv: 1906.10887 · 2021-05-13

## TL;DR

This paper introduces a novel end-to-end spatial transformer approach for 3D point clouds, enabling dynamic adaptation of local neighborhoods at each network layer to improve understanding and segmentation accuracy.

## Contribution

It proposes linear and non-linear spatial transformers that learn optimal local neighborhoods at each layer, enhancing 3D point cloud processing performance.

## Key findings

- Achieves 8% accuracy gain on ShapeNet part segmentation.
- Outperforms state-of-the-art in classification, detection, segmentation.
- Visualizations show improved feature learning through dynamic neighborhood alteration.

## Abstract

Deep neural networks are widely used for understanding 3D point clouds. At each point convolution layer, features are computed from local neighborhoods of 3D points and combined for subsequent processing in order to extract semantic information. Existing methods adopt the same individual point neighborhoods throughout the network layers, defined by the same metric on the fixed input point coordinates. This common practice is easy to implement but not necessarily optimal. Ideally, local neighborhoods should be different at different layers, as more latent information is extracted at deeper layers. We propose a novel end-to-end approach to learn different non-rigid transformations of the input point cloud so that optimal local neighborhoods can be adopted at each layer. We propose both linear (affine) and non-linear (projective and deformable) spatial transformers for 3D point clouds. With spatial transformers on the ShapeNet part segmentation dataset, the network achieves higher accuracy for all categories, with 8\% gain on earphones and rockets in particular. Our method also outperforms the state-of-the-art on other point cloud tasks such as classification, detection, and semantic segmentation. Visualizations show that spatial transformers can learn features more efficiently by dynamically altering local neighborhoods according to the geometry and semantics of 3D shapes in spite of their within-category variations. Our code is publicly available at https://github.com/samaonline/spatial-transformer-for-3d-point-clouds.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.10887/full.md

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/1906.10887/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/1906.10887/full.md

---
Source: https://tomesphere.com/paper/1906.10887