Spatio-thermal depth correction of RGB-D sensors based on Gaussian Processes in real-time
Christoph Heindl, Thomas P\"onitz, Gernot St\"ubl, Andreas, Pichler, Josef Scharinger

TL;DR
This paper introduces a real-time method using Gaussian Processes to correct RGB-D sensor depth inaccuracies caused by spatial and thermal effects, enhancing their reliability for computer vision and robotics applications.
Contribution
It presents a novel Gaussian Process-based calibration method that jointly models spatial and thermal influences for RGB-D sensors, enabling real-time dense depth correction.
Findings
Achieves accurate depth correction considering thermal effects
Operates in real-time leveraging GPU acceleration
Provides publicly available dataset and source code
Abstract
Commodity RGB-D sensors capture color images along with dense pixel-wise depth information in real-time. Typical RGB-D sensors are provided with a factory calibration and exhibit erratic depth readings due to coarse calibration values, ageing and thermal influence effects. This limits their applicability in computer vision and robotics. We propose a novel method to accurately calibrate depth considering spatial and thermal influences jointly. Our work is based on Gaussian Process Regression in a four dimensional Cartesian and thermal domain. We propose to leverage modern GPUs for dense depth map correction in real-time. For reproducibility we make our dataset and source code publicly available.
| (mm) | (mm) | (mm) | |
|---|---|---|---|
| RMSE before correction | 5.7 | 3.7 | 16.0 |
| RMSE after correction | 0.7 | 0.5 | 2.2 |
| Time (s) | FPS (1/s) | |
|---|---|---|
| CPU | 20 | 0.05 |
| GPU naïve | 0.4 | 2.5 |
| GPU optimized | 0.14 | 7.1 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Image Processing Techniques and Applications · Advanced Vision and Imaging
MethodsGaussian Process
Spatio-thermal depth correction of RGB-D sensors based on Gaussian Processes in real-time
Christoph Heindl
Profactor GmbH, Im Stadtgut A2, 4407 Steyr-Gleink, Austria
Thomas Pönitz
Profactor GmbH, Im Stadtgut A2, 4407 Steyr-Gleink, Austria
Gernot Stübl
Profactor GmbH, Im Stadtgut A2, 4407 Steyr-Gleink, Austria
Andreas Pichler
Profactor GmbH, Im Stadtgut A2, 4407 Steyr-Gleink, Austria
Josef Scharinger
JKU Department of Computational Perception, Altenbergerstr. 69, 4040 Linz, Austria
Abstract
Commodity RGB-D sensors capture color images along with dense pixel-wise depth information in real-time. Typical RGB-D sensors are provided with a factory calibration and exhibit erratic depth readings due to coarse calibration values, ageing and thermal influence effects. This limits their applicability in computer vision and robotics. We propose a novel method to accurately calibrate depth considering spatial and thermal influences jointly. Our work is based on Gaussian Process Regression in a four dimensional Cartesian and thermal domain. We propose to leverage modern GPUs for dense depth map correction in real-time. For reproducibility we make our dataset and source code publicly available.
keywords:
RGB-D, Thermal, Depth Calibration, Gaussian Process Regression, GPU
1 INTRODUCTION
Since their introduction in 2010 active structured-light RGB-D cameras have been widely applied to computer vision and robotic applications [1]. Newcombe et al.[2] successfully employed RGB-D cameras in real-time 3D surface reconstructions. Later this work was extended by Heindl et al.[3] to generate photorealistic reconstructions by aligning depth and color streams. Combining features from depth and color images robustified object recognition[4] and led to a transition from stereo vision to RGB-D cameras as the main imaging device in indoor robotics[5].
RGB-D cameras provide simultaneous streams of color and dense depth information. Color images are typically provided by standard RGB cameras, while depth readings are generated by a separate time-of-flight or active structured infrared light sensor. Both devices are usually assembled together in a compact housing. A default factory calibration encompassing intrinsic and extrinsic camera parameters is stored within the device memory. The quality of calibration is often impractical for computer vision applications with high accuracy demands for the following reasons: the assumed pinhole camera model together with radial and tangential distortion coefficients does not capture the true underlying nonlinear deformations found in depth maps well. Additionally, strong thermal dependencies on depth readings undermine the effectiveness of a single static mono/stereo calibration.
In this paper we propose a novel calibration method that corrects depth readings based on a nonlinear model that spans the spatial and thermal domain. We frame depth correction as a probabilistic regression problem and show that Gaussian Process Regression (GPR) is well suited for capturing the underlying systematic errors. For dense real-time correction we propose to leverage the fact that GPR prediction is mostly linear algebra and is therefore well suited to be parallelized on modern GPU architectures. Python code alongside with a captured dataset is made publicly available111https://github.com/cheind/rgbd-correction.
2 RELATED WORK
The availability of modern commodity RGB-D sensors raised the interest in methods for accurate calibration thereof. Typical approaches [6, 7] use a calibration target together with intrinsic and extrinsic parameter estimation methods, such as the one proposed by Zhang [8]. Since one usually has no access to the internals of the depth generation process, these attempts are insufficient for accurate depth correction, because they merely correct for lens effects. In contrast, the work of Smisek et al. [9] assumes planarity in the observed data. They propose to correct depth by computing a pixel-wise mean residual depth image over all calibration poses. Zhang et al. [10] exploit co-planarity in the structure of the calibration target and propose to model the observed depth as a linear function of the true depth. In the work of Canessa et al. [11] a second order polynomial per pixel is proposed to compensate for depth artefacts. The closest work compared to ours, in terms of applied methods, is by Amamra et. al [12], who use a Gaussian Process to predict absolute depth from spatial locations. More recently online depth calibration methods try to compensate depth errors using a visual SLAM system[13].
Several research groups documented a strong thermal influence on the depth generation process of RGB-D sensors. Mankoff et al. [14] note a severe depth shift due to internal or external thermal changes. Fiedler et al. [15] document a nonlinear distortion effects during thermal changes. They also propose practical rules of thumb on reducing accuracy errors caused by thermal conditions.
To our knowledge we are the first to propose a practical, real-time, pixel-wise depth correction method for RGB-D cameras that considers spatial and thermal aspects jointly.
3 DEPTH CORRECTION
Throughout this work we use lower-case non-bold characters to denote scalars, bold-faced lower-case characters represent column-vectors and upper-case bold characters for matrices. denotes the -th element of , the -th row and -th column of . Functions will be written in calligraphic letters .
3.1 Dense depth maps
RGB-D cameras provide depth readings in dense image form . Depth values can be reprojected into three-dimensional Cartesian space through a function given by
[TABLE]
where is the associated camera intrinsic matrix and is a point in homogeneous image space. Typically one is interested in alignment of depth and RGB images. This can be accomplished by transforming the entire depth map into the RGB camera space as follows
[TABLE]
where and are parameters of the rigid extrinsic transformation matrix, is the intrinsic camera matrix of the target sensor. Extrinsic camera parameters are either obtained by stereo camera calibration or are directly provided by the device itself. We do not expect intrinsic nor extrinsic matrices to be very accurate for our approach to work.
3.2 Gaussian Process Regression
A Gaussian Process (GP)[16, 17] is a statistical model used in supervised machine learning for regression and classification tasks. In regression, one searches for a unknown function that best approximates tuples of feature vectors , and target scalar observations . The term is the usual i.i.d. Gaussian observation noise. In a probabilistic view, the expectation of the posterior predictive distribution is used to infer unknown values from feature vectors . The posterior predictive takes on the following general form
[TABLE]
where and . Typically, integrating over all possible functions is intractable. Hence, Gaussian Processes make the assumption that training and predictions are jointly Gaussian
[TABLE]
where is the mean function and is a positive-definite function. Then, from Gaussian conditioning rules [18] one gets the posterior predictive in closed form by conditioning on , and
[TABLE]
The covariance function encodes a distance or similarity measure between input samples. For two samples , , is given by , with being a positive definite kernel function. Multiple kernel functions are available[19]. Choosing one is highly problem specific. In this work we focus on the smooth radial basis function (RBF) a.k.a. the squared exponential kernel which, in the multidimensional input case, is given by
[TABLE]
where is the signal variance regularizing the allowed deviation from the mean, is an input domain coordinate scaling matrix, and is the expected noise level. We restrict our considerations to diagonal versions of . The effects of varying these parameters are illustrated in [17].
3.3 Approach for Spatio-Thermal Correction
Our approach for correcting depth maps is based on estimating a per-pixel depth correction offset. For a given depth map and temperature reading we compute a corrected depth map by
[TABLE]
where is given by a Gaussian Process Regression . The input to the Gaussian Process is given by reprojecting the pixel coordinate to Cartesian space using pixel-wise (potentially distorted) depth information and stacking the result with to create a four-dimensional input feature vector
[TABLE]
The corresponding regression target at training time is given by the pixel-wise difference between a ground-truth depth map and observed depth map , where
[TABLE]
Regressing relative correction offsets leads to predictions that vary smoothly accross the input domain. This justifies our usage of the squared exponential kernel given in Equation 6, which has a strong prior on smooth functions. Note, this is in contrast to the approach taken in Amamra et al.[12], who regress absolute depth using a similar kernel. However, in our experience real world depth maps rarely exhibit smooth behaviour due to strong depth discontinuities along object boundaries.
At training time we sample feature vectors and regression values in the spatio-thermal domain using a regular spaced grid. At test time we compute for every four dimensional query point the optimal prediction and confidence according to Equation 5. In turn is used to construct the corrected depth map according to Equation 7.
4 EVALUATION
This section covers the experiments which have been performed to prove the applicability of the approach.
4.1 Hardware Setup
Figure 1 is a schematic overview of the hardware set-up used for data acquisition. As RGB-D sensor an Orbbec Astra Mini S222https://orbbec3d.com/astra-mini/ is used, since it is a typical commodity sensor without special housing. The sensor looks towards a chessboard calibration pattern and is mounted on an electronic linear axis to adjust the distance. Directly connected to the heat sink of the sensor is a Peltier element which is used for heating and cooling. Due to the fact that all temperature-critical elements of the sensor are connected to the heat pipe, it is assumed that thermostabilization of the heat pipe directly controls the temperature of the main camera components, e.g image sensors and projector.
The sensor of the temperature control loop is directly thermally coupled to the heat pipe, which is of special importance. In a future application, no control loop is needed and the sensor is the only additional hardware which is required to apply the proposed correction approach. Since the housing of an Orbbec Astra Mini S serves as heat pipe, there is no need to modify any sensor hardware in an application.
The evaluation itself is run on a workstation with an Intel i7-7700 @ 3.60 GHz, 16 GB RAM, and a NVIDIA GeForce GTX1080 graphics card with 6 GB memory.
4.2 Spatio-thermal RGB-D dataset
Using the hardware setup described above we captured the following data: for each temperature level, ranging from 10 to 35 *∘*C in steps 1 *∘*C, we moved the axis to six different positions spaced 10 cm apart. The covered distance between sensor and calibration target ranges from 0.4 to 1 meter, which matches the operating specifications of the sensor. At each calibration position we captured 50 RGB and 50 depth frames. Depth data was captured aligned with color data as described in Section 3.1. From each color frame an artificial dense depth map was computed from known object geometry and planarity assumptions.
4.3 Implementation Details
For numerical stability it is advised to avoid performing the direct matrix inversion of the covariance block matrix in Equation 5[16]. A robust alternative is to pre-factor the positive-definite covariance matrix using Cholesky decomposition , and then use and for subsequent calculations.
For our parallel GPU implementation we leverage the TensorFlow[20] framework to obtain optimized tensor-products on GPUs. Computing the posterior predictive from Equation 5 decomposes nicely in matrix products. Additionally, the kernel formulation is already partly vectorized, see Equation 6. Also note, that and can be pre-computed on the CPU for a given training data set. The GPU then reads these values as constant matrices into GPU memory.
We experimented with different versions of computing the kernel function. We found that the type of functions used had a heavy impact on execution speed. The best performance improvement compared to running on CPU was found to be one that makes explicit use of the following equivalence
[TABLE]
Additionally preloading data onto the GPU, while computing correction for previous batches, helps significantly in reducing GPU stalling due to I/O waits. For details consult our source code.
5 RESULTS
In order to validate our findings, we captured depth data according to the procedure described in Section 4.2. The sensor has undergone a standard intrinsic and extrinsic calibration before capturing.
For the remainder of this work we refer to the device provided pixel-wise mean depth map at axis position , temperature by . The mean depth map is computed by averaging over multiple captures (50 in our case) per axis position and temperature. Similarly, we refer to the artificial generated mean depth map from color images by . Note that we assume to be pre-aligned with the RGB device by means of methods presented in Section 3.1.
Figure 2 illustrates the thermal influences on depth readings for a fixed position . The two plots are made from two small opposite image regions (top-left, bottom-right). One can clearly see the depth drift associated with is up to 4-5 times larger than the depth drift of . While the depth from RGB drift appears linear with increasing temperature, the depth readings from behave highly nonlinear. Additionally, the curves tend to flatten as we decrease the distance to the calibration target and escalate as we move further away from it.
For Gaussian Process Regression training, we sample input feature vectors from a regularly spaced Cartesian 3D grid in meters, every three degrees Celsius. The corresponding target values are then computed from the depth difference map given by
[TABLE]
That is, we always calibrate with respect to depth from color at the lowest possible temperature level. In total our training data consists of elements and elements. We then auto-tune the kernel hyper-parameters using gradient descent of the negative log-marginal-likelihood[17], yielding the following optimal parameters for our dataset: is given by , and for our dataset. As expected, our formulation of results in high importance along the temperature domain, making samples far apart in the temperature domain still influential on the regressed depth correction.
Applying the proposed Gaussian Process Regression reduces the error significantly. The plots in Figure 3 show the pixel-wise error before and after application of the regression for a selected set of positions and temperatures. For areas of missing depth data no predictions can be made. Those pixels appear as white speckles in the illustrations. A numerical evaluation spanning the entire distance and temperature range is provided in Table 1. One can see that the correction yields an improvement by one order of magnitude.
Table 2 compares execution times for CPU and GPU variants on dense depth maps consisting of over 300,000 points as described in Section 4.3. The GPU variants clearly outperform an optmized CPU implementation. When possible, one should prefetch data to reduce wait times and enable real-time performance.
To show generality of our findings we captured and evaluated data from two equivalent sensors. However, the results differed only marginally.
6 CONCLUSION AND FUTURE WORK
In this paper we demonstrated the thermal influence on active structured-light RGB-D cameras based on Orbbec Astra Mini S sensors. A novel depth-delta calibration method, based on probabilistic regression in the spatial and thermal domains, was presented. We showed that Gaussian Progress Regression is well suited to correct depth errors under spatio-thermal changes. The comprehensive evaluation of the proposed approach on a novel dataset demonstrates a reduction of the root mean squared error by one order of magnitude. A further comparison of CPU and GPU implementations proves the real-time performance capabilities of our approach. The dataset and source code was made publicly available.
Possible future work concerns the evaluation of sensors such as time-of-flight based models. Regarding performance, a scale-space approach could enable real-time performance directly on embedded systems. Additionally, fixed point or half-precision variants could be examined to further relax computational runtime requirements.
7 ACKNOWLEDGEMENTS
This research is funded by the projects Lern4MRK (Austrian Ministry for Transport, Innovation and Technology), and AssistMe (FFG, 848653), as well as the European Union in cooperation with the State of Upper Austria within the project Investition in Wachstum und Beschäftigung (IWB). The authors want to thank Mr. Gerhard Ebenhofer for fruitful discussions and help during the data acquisition.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Zhang, Z., “Microsoft Kinect sensor and its effect,” Multi Media 19 (2), 4–10 (2012).
- 2[2] Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A. J., Kohi, P., Shotton, J., Hodges, S., and Fitzgibbon, A., “Kinectfusion: Real-time dense surface mapping and tracking,” in [ 10th International Symposium on Mixed and Augmented Reality (ISMAR) ], 127–136, IEEE (2011).
- 3[3] Heindl, C., Akkaladevi, S. C., and Bauer, H., “Capturing photorealistic and printable 3D models using low-cost hardware,” in [ International Symposium on Visual Computing (ISVC) ], 507–518, Springer (2016).
- 4[4] Lai, K., Bo, L., Ren, X., and Fox, D., “Sparse distance learning for object recognition combining RGB and depth information,” in [ International Conference on Robotics and Automation (ICRA) ], 4007–4013, IEEE (2011).
- 5[5] El-laithy, R. A., Huang, J., and Yeh, M., “Study on the use of Microsoft Kinect for robotics applications,” in [ Position Location and Navigation Symposium (PLANS) ], 1280–1288, IEEE/ION (2012).
- 6[6] Konolige, K. and Mihelich, P., “Technical description of Kinect calibration. 2012,” (2012). http://wiki.ros.org/kinect_calibration/technical.
- 7[7] Burrus, N., “Kinect calibration,” (2011). http://nicolas.burrus.name/index.php/Research/Kinect Calibration.
- 8[8] Zhang, Z., “A flexible new technique for camera calibration,” IEEE Transactions on pattern analysis and machine intelligence (TPAMI) 22 (11), 1330–1334 (2000).
