Robust Building-based Registration of Airborne LiDAR Data and Optical   Imagery on Urban Scenes

Thanh Huy Nguyen; Sylvie Daniel; Didier Gueriot; Christophe Sintes and; Jean-Marc Le Caillec

arXiv:1904.03668·cs.CV·November 28, 2019

Robust Building-based Registration of Airborne LiDAR Data and Optical Imagery on Urban Scenes

Thanh Huy Nguyen, Sylvie Daniel, Didier Gueriot, Christophe Sintes and, Jean-Marc Le Caillec

PDF

1 Repo

TL;DR

This paper introduces a robust method for registering airborne LiDAR data with optical imagery of urban scenes by leveraging building region extraction and graph transformation matching, improving data alignment for better fusion.

Contribution

The paper presents a novel building-based registration approach combining segmentation and graph matching to align LiDAR and optical data acquired from different platforms and times.

Findings

01

Significantly reduces relative shifts between datasets

02

Enables high-quality data fusion

03

Improves registration robustness in urban scenes

Abstract

The motivation of this paper is to address the problem of registering airborne LiDAR data and optical aerial or satellite imagery acquired from different platforms, at different times, with different points of view and levels of detail. In this paper, we present a robust registration method based on building regions, which are extracted from optical images using mean shift segmentation, and from LiDAR data using a 3D point cloud filtering process. The matching of the extracted building segments is then carried out using Graph Transformation Matching (GTM) which allows to determine a common pattern of relative positions of segment centers. Thanks to this registration, the relative shifts between the data sets are significantly reduced, which enables a subsequent fine registration and a resulting high-quality data fusion.

Figures10

Click any figure to enlarge with its caption.

Tables3

Table 1. Table 1 : Details of datasets: LiDAR ©Ville de Québec, aerial imagery ©Communauté Métropolitaine de Québec, and satellite imagery ©Centre National d’Études Spatiales (France).

No.	Data type	Spectral resolution	Spatial resolution	Acquisition time (season)	Geometry/Properties	Estimated relative shift
1	Aerial optical imagery	8 bits (RGBI)	15 cm	June 2016 (summer)	$∙$ Orthorectified	1 - 2 m
	Aerial optical imagery	8 bits (RGBI)	15 cm	June 2016 (summer)	$∙$ Georeferenced
	LiDAR	8 bits (Intensity)	8 points/m²	May-Jun 2017 (summer)	Classified
2	Aerial optical imagery	8 bits (RGBI)	15 cm	Jul-Aug 2013 (summer)	$∙$ Central perspective	2.5 - 10 m
	Aerial optical imagery	8 bits (RGBI)	15 cm	Jul-Aug 2013 (summer)	$∙$ No georeferencing
	LiDAR	8 bits (Intensity)	2 points/m²	Oct-Nov 2011 (winter)	Classified
3	Satellite imagery	Panchromatic	50 cm	July 2015 (summer)	$∙$ No orthorectification	25 - 40 m
	Satellite imagery	Multispectral (4 bands)	2 m	July 2015 (summer)	$∙$ Georeferenced
	LiDAR	8 bits (Intensity)	2 points/m²	Oct-Nov 2011 (winter)	Classified

Table 2. Table 2 : Performance of building extractions and matching algorithms on selected areas (28 buildings in total).

	Extracted from LiDAR data	Extracted from image by mean shift	Matching result by RANSAC	Matching result by GTM
TP/FA/M	28/0/0	24/21/4	8/0/12	19/7/1
Precision	100%	53.33%	100%	73.08%
Recall	100%	85.71%	40%	95%

Table 3. Table 3 : Average estimated relative shift between data sets, before and after the registration.

Data set	Average estimated relative shift
Data set	Before	After	Gain
1	1.41 m	0.49 m	65.25%
2	2.83 m	1.32 m	53.36%
3	40.81 m	1.75 m	95.71%

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nthuy190991/igarss2019
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\TPMargin

5pt

{textblock}0.84(0.08,0.93) ©2019 IEEE. Published in the IEEE 2019 International Geoscience & Remote Sensing Symposium (IGARSS 2019), scheduled for July 28 - August 2, 2019 in Yokohama, Japan. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-1331, USA. Telephone: + Intl. 908-562-3966.

Robust Building-based Registration of Airborne LiDAR Data and Optical Imagery on Urban Scenes

Abstract

The motivation of this paper is to address the problem of registering airborne LiDAR data and optical aerial or satellite imagery acquired from different platforms, at different times, with different points of view and levels of detail. In this paper, we present a robust registration method based on building regions, which are extracted from optical images using mean shift segmentation, and from LiDAR data using a 3D point cloud filtering process. The matching of the extracted building segments is then carried out using Graph Transformation Matching (GTM) which allows to determine a common pattern of relative positions of segment centers. Thanks to this registration, the relative shifts between the data sets are significantly reduced, which enables a subsequent fine registration and a resulting high-quality data fusion.

Index Terms— Airborne LiDAR, aerial imagery, satellite imagery, heterogeneous registration, building extraction, mean shift segmentation, graph transformation matching, urban scene.

1 Introduction

Over the years, existing works in the domain of aerial or satellite imagery and airborne LiDAR fusion have addressed very specific acquisition contexts, where the respective image and LiDAR 3D point cloud are already registered and/or they are acquired from the same platform at identical or very close dates. For example, solutions proposed in the 2013 GRSS Data Fusion Contest [1] focused on performing a fusion between LiDAR data and hyperspectral imagery with the same spatial resolution and acquisition dates on two consecutive days. In 2015, the contest [2] involved extremely high resolution LiDAR data and RGB imagery collected from the same airplane with the sensors rigidly fixed to the same platform. Thus, the solutions submitted to these contests have never been intended to overcome the inherent obstacles of data sets collected from different platforms with different acquisition configuration (e.g. flying track, height, orientation, etc.) at different moments and even in different seasons, with different spatial resolutions and levels of detail. The need for a relevant registration in such context is exemplified in the work undertaken by Cura et al. [3]. It relates to the rise of Geographical Information System (GIS) availability, in particular through the open data movement, that requires the integration of data from multiple and heterogeneous sources. However, a solution that is versatile enough to satisfy this difficult context still remains an unsolved research problem.

Accurate registration of LiDAR data and optical imagery is a prerequisite to data fusion applications [4]. The majority of automatic methods for registration of such data sets can be classified into two categories, namely intensity-based and feature-based methods. Feature-based methods establish correspondence between data sets based on available distinguishable features. They involve feature extraction algorithms and feature matching strategy [5]. On the other hand, intensity-based methods determine the optimal sensors pose by maximizing a statistical similarity (e.g. mutual information) between the values of, respectively, image pixels and LiDAR-derived pixels [4, 6]. However, in addition to the high computational cost, the main drawback of these methods is the needs for the data sets to be spatially close to each other, to have the same resolution and to display similar intensity characteristics [4, 6]. As a result, we present in this paper a novel feature-based registration approach capable of overcoming the challenges of the aforementioned research context. We focus on urban scenes and more specifically on buildings as primitives on which the matching between the data sets relies.

The paper is organized as follows. Section 2 is devoted to the description of the proposed registration method, consisting of three successive steps, namely feature extraction, feature matching and transformation model estimation. Then, experimental results involving different data sets are presented in Section 3. Finally, Section 4 provides conclusions and perspectives of this work.

2 Proposed registration method

Our novelty resides in a methodology that carries out effectively building extraction and matching, by the virtue of well-tailored series of well-known processes and algorithms. The main steps of the proposed registration method based on buildings is illustrated by Fig. 1. On each data set we perform different processes with the purpose of extracting buildings from the observed urban scene. On the one hand, we apply an elevation thresholding on LiDAR point cloud 3D coordinates in order to select building points. On the other hand, mean shift segmentation is performed on the optical image with a carefully chosen bandwidth parameter, followed by a refinement to remove unwanted segments and preserve building-like ones.

2.1 Feature extraction

2.1.1 Building extraction from LiDAR data

The extraction of buildings from LiDAR point cloud is carried out through a series of steps, as follows:

Input $|$

LiDAR 3D point cloud $(X,Y,Z)$ . 2. Step 1 $|$

Elevation thresholding: separating non-ground points from ground points depending on their elevation value. This task is proposed by many existing works as an initial but necessary step [7]. The elevation threshold value is calculated as follows: $T_{e}=\mathrm{mean}(z_{G})+\max\{2.5,\mathrm{std}(z_{G})\}$ , where $z_{G}$ denotes the altitude of ground points. 3. Step 2 $|$

Vertical projection: all non-ground points are vertically projected onto the plan $z=0$ , which creates a 2D binary mask of non-ground points. The resolution of this binary mask is set accordingly to the point density of the input LiDAR point cloud to avoid null-value pixels, e.g. a resolution of 1 meter $\times$ 1 meter for a point cloud of 2 points/m2 density. 4. Step 3 $|$

Morphological opening is then applied on the binary mask to remove small regions as well as rounding up bigger ones. The morphological structuring element is a diamond shape. Its size is 5 or 7 pixels (depending on the area). 5. Step 4 $|$

Connectivity labeling: connecting pixels into segments based on their connectivity, and then labeling these segments. 6. Step 5 $|$

A removal of small regions that are smaller than 20 square meters is carried out, which results in a labeled building mask. 7. Step 6 $|$

Extracting building points: based on the labeled building mask, we select among the non-ground points only the regions that are seeded by labeled segments. 8. Output $|$

Building 3D regions and their boundary.

2.1.2 Building segmentation from optical image using Mean shift

Mean shift is an unsupervised clustering method widely used in many areas of Computer Vision, including 2D shape extraction, and texture segmentation [8]. Compared to $k$ -means clustering, mean shift does not require a prior number of classes, but a value of bandwidth corresponding to the image color range and size of objects to be segmented. Moreover, in an urban area, $k$ -means fails to segment buildings because building roof color varies a lot, and also building roofs and streets may have similar color.

Fig. 2 presents a flowchart of the building segmentation on optical image using mean shift algorithm.

First, the optical visible image is converted into the CIE Lab* color space, as this color space allows better distinction of objects than RGB color space. For the satellite imagery, a pansharpening is carried out to merge 50-cm resolution panchromatic and 2-m resolution multispectral imagery to create a 50-cm color image, which will be segmented by mean shift. However, determining the best bandwidth parameter for mean shift still remains difficult even though a number of approaches have been explored [9]. Thus, this bandwidth parameter should be set manually according to the type of area (either residential, industrial, mixed, etc.), and the size of objects of interest. In other words, the bandwidth parameter selection is based on the contextualization of the scene, alongside with the choice between (a*, b*) values and (L*, a*, b*) values.

When applying the mean shift segmentation, many building regions are segmented alongside with other regions related to trees, streets, or cars. Obviously, these unwanted non-building segments need to be removed before carrying out the comparison with the building segments extracted from LiDAR point cloud. To this purpose, we rely on the number of pixels inside each segment and remove small segments since they usually correspond to trees and cars. Large segments are removed similarly since they correspond to street regions. This filter is simple and efficient [10], but completely dependent on the image resolution. Therefore, it needs a manual intervention to be set correctly. The authors of [10] also proposed two additional filters based on the length ratio of the segment major and minor axis, and the segment eccentricity to remove falsely detected building segments and keep the segments that are associated to rectangular and round building regions. However, they are not effective in the case of complex building segments. Also, it is not clear how the axes and the eccentricity of the segments are determined. In addition, the thresholds used by these filters are highly subjective. In this paper, we present another approach to discriminate buildings apart from regions that relate to trees or streets. After applying the preliminary filter based on the number of pixels inside each segment, we identify the minimal bounding rectangle (MBR) of each segment. Based on this rectangle, we calculate the filling percentage of each segment $\%_{\textnormal{MBR filling}}=\nicefrac{{\textnormal{Area(segment)}}}{{\textnormal{Area(MBR)}}}\times 100$ . This percentage is then used to filter the unwanted segments, as filling percentage of a rectangle building segment should be higher than that of an unwanted segment, as compared on Fig. 3.

2.2 Feature comparison and matching

After extracting building segments from the data sets, the next step is to compare and match them. From the optical image, we select the segments that have a MBR filling percentage higher than 50%. On the other hand, all building regions extracted from the LiDAR point cloud will be taken into consideration. Both sets of extracted segments are depicted on Fig. 4.

Segment comparison and matching issues are anticipated. Indeed, the data sets are relatively distant to each other (cf. Table 1), and wrongly extracted segments may still exist after the MBR-based segment refinement. Therefore, matching the segments based on their spatial relation w.r.t. their neighbors is more relevant than comparing their individual values. Indeed, taking into account the position of the segment center, a common pattern representing the relative spatial arrangement of the data sets can be determined using GTM (Graph Transformation Matching) algorithm [11]. GTM is a graph-based point matching algorithm designed for solving the registration between images with non-rigid deformations. This algorithm performs better than RANSAC in removing outliers on test image data sets [11], as well as in our work (cf. Table 2).

In practice, both GTM and RANSAC require an initial one-to-one matching of segment centers, which can be carried out relying on the positions of vertically projected 3D building region centers onto plan $z=0$ and the centers of 2D segments (extracted by mean shift segmentation). If the relative shifts are too big (e.g. data set no. 3), this initial matching is added with a translation vector calculated based on the displacement of the largest segment. The results of the initial matching of segment centers, followed by RANSAC are shown on Fig. 5a and 5b; whereas Fig. 5c depicts GTM result, as well as the result after a refinement of false positives from GTM result based on the area value and the direction of segments (cf. Fig. 5d).

2.3 Transformation model estimation

The coordinates of the matched segment centers are then used to estimate the transformation model composed of parameters of the imaging camera pose. They are exterior orientation parameters, which are the position $(X_{0},Y_{0},Z_{0})$ and orientation $(\omega,\phi,\kappa)$ of the camera when the image was acquired. In this paper, this estimation is carried out using Gold Standard algorithm detailed in [12, p.187].

3 Experimental results

The proposed registration method has been tested using three different pairs of data sets, as described by Table 1. Table 2 summarizes the results of building extraction and matching on selected areas, as the number of true positives (TP, i.e. good extraction or matching), false alarms (FA, i.e. wrongly extracted or matched), and misses (M, i.e. buildings exist but not extracted or not matched).

The overall results of the registration can be assessed through a reduction of the relative shift between data sets, measured from several manually selected control points, cf. Table 3. This reduction is also demonstrated by overlapped data sets before and after the registration on Fig. 6. All full-scale color figures of this paper can be found on https://github.com/nthuy190991/igarss2019.

4 Conclusions and Perspectives

In this paper, we present a dedicated registration approach between airborne LiDAR data and optical imagery which are not acquired from the same platform, neither with the same point of view nor the same spatial resolution. This approach focusing on extracting and matching building regions, allows reducing drastically the relative shifts between data sets, namely more than 50% of the displacement between LiDAR data and aerial imagery and 95.71% of the displacements between LiDAR data and satellite imagery. It also improves the alignment when overlapping the back-projected LiDAR point cloud on the optical image. Based on these results, a fine registration between these data sets could be applied, which is necessary to align them at an accuracy of 1-pixel level. This accuracy is required in order to fully benefit from the advantages of the two data sets and carry out a fusion providing a better completeness and a reduced uncertainty of the observed scene [4].

5 Acknowledgment

The authors would like to thank the Centre GéoStat (Université Laval, QC, Canada), as well as Québec City, Communauté Métropolitaine de Québec (Canada), and Centre National d’Etudes Spatiales (France) for providing the data sets used in this work.

Bibliography12

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] C. Debes, A. Merentitis, R. Heremans, J. Hahn, N. Frangiadakis, T. van Kasteren, W. Liao, R. Bellens, A. Pižurica, S. Gautama, et al. , “Hyperspectral and lidar data fusion: Outcome of the 2013 grss data fusion contest,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , vol. 7, no. 6, pp. 2405–2418, 2014.
2[2] A.-V. Vo, L. Truong-Hong, D. Laefer, D. Tiede, S. d’Oleire Oltmanns, A. Baraldi, M. Shimoni, G. Moser, and D. Tuia, “Processing of extremely high resolution lidar and rgb data: outcome of the 2015 ieee grss data fusion contest—part b: 3-d contest,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , vol. 9, no. 12, pp. 5560–5575, 2016.
3[3] R. Cura, J. Perret, and N. Paparoditis, “A scalable and multi-purpose point cloud server (pcs) for easier and faster point cloud data management and processing,” ISPRS Journal of Photogrammetry and Remote Sensing , vol. 127, pp. 39–56, 2017.
4[4] E. G. Parmehr, C. S. Fraser, C. Zhang, and J. Leach, “Automatic registration of optical imagery with 3D Li DAR data using statistical similarity,” ISPRS Journal of Photogrammetry and Remote Sensing , vol. 88, pp. 28–40, 2014.
5[5] R. M. Palenichka and M. B. Zaremba, “Automatic extraction of control points for the registration of optical satellite and lidar images,” IEEE Transactions on Geoscience and Remote sensing , vol. 48, no. 7, pp. 2864–2879, 2010.
6[6] A. Mastin, J. Kepner, and J. Fisher, “Automatic registration of LIDAR and optical images of urban scenes,” 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009 , pp. 2639–2646, 2009.
7[7] M. Awrangjeb, C. Zhang, and C. S. Fraser, “Automatic extraction of building roofs using lidar data and multispectral imagery,” ISPRS journal of photogrammetry and remote sensing , vol. 83, pp. 1–18, 2013.
8[8] D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Transactions on pattern analysis and machine intelligence , vol. 24, no. 5, pp. 603–619, 2002.