Numerical Coordinate Regression with Convolutional Neural Networks
Aiden Nibali, Zhen He, Stuart Morgan, Luke Prendergast

TL;DR
This paper introduces DSNT, a differentiable, parameter-free layer for neural networks that improves coordinate regression accuracy and spatial generalization, outperforming heatmap matching in pose estimation tasks.
Contribution
The paper proposes DSNT, a novel differentiable layer that enhances coordinate regression in CNNs, offering better accuracy and efficiency over existing heatmap matching methods.
Findings
DSNT outperforms heatmap matching in pose estimation accuracy.
DSNT maintains good spatial generalization with low-resolution heatmaps.
The method is compatible with various CNN architectures.
Abstract
We study deep learning approaches to inferring numerical coordinates for points of interest in an input image. Existing convolutional neural network-based solutions to this problem either take a heatmap matching approach or regress to coordinates with a fully connected output layer. Neither of these approaches is ideal, since the former is not entirely differentiable, and the latter lacks inherent spatial generalization. We propose our differentiable spatial to numerical transform (DSNT) to fill this gap. The DSNT layer adds no trainable parameters, is fully differentiable, and exhibits good spatial generalization. Unlike heatmap matching, DSNT works well with low heatmap resolutions, so it can be dropped in as an output layer for a wide range of existing fully convolutional architectures. Consequently, DSNT offers a better trade-off between inference speed and prediction accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robotics and Sensor-Based Localization · Advanced Neural Network Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Heatmap · Average Pooling · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block
