Procedural Synthesis of Remote Sensing Images for Robust Change   Detection with Neural Networks

Maria Kolos; Anton Marin; Alexey Artemov; Evgeny Burnaev

arXiv:1905.07877·cs.CV·May 21, 2019

Procedural Synthesis of Remote Sensing Images for Robust Change Detection with Neural Networks

Maria Kolos, Anton Marin, Alexey Artemov, Evgeny Burnaev

PDF

1 Repo

TL;DR

This paper introduces a procedural method for generating synthetic remote sensing images using game engines, enhancing change detection performance when real data is scarce.

Contribution

A novel pipeline for creating realistic synthetic remote sensing datasets to improve neural network change detection under limited data conditions.

Findings

01

Synthetic datasets improve model performance.

02

Pipeline accelerates convergence of neural networks.

03

Method is efficient and scalable.

Abstract

Data-driven methods such as convolutional neural networks (CNNs) are known to deliver state-of-the-art performance on image recognition tasks when the training data are abundant. However, in some instances, such as change detection in remote sensing images, annotated data cannot be obtained in sufficient quantities. In this work, we propose a simple and efficient method for creating realistic targeted synthetic datasets in the remote sensing domain, leveraging the opportunities offered by game development engines. We provide a description of the pipeline for procedural geometry generation and rendering as well as an evaluation of the efficiency of produced datasets in a change detection scenario. Our evaluations demonstrate that our pipeline helps to improve the performance and convergence of deep learning models when the amount of real-world data is severely limited.

Tables1

Table 1. Table 1: Statistical change detection results with models trained using strategies A – E . When using CW as a fine-tuning target, we only select a training subset; (*) indicates frozen encoder, (**) indicates augmentations (see Section 4.4 ).

	Training datasets			Ventura full		Ventura 1/16		SR full		SR 1/16
	Init.	Pre	Fine	IoU	F1	IoU	F1	IoU	F1	IoU	F1
A	–	CW	–	0.695	0.820	0.310	0.460	0.487	0.504	0.337	0.639
B-1	–	SynCW^∗	CW	0.713	0.832	0.312	0.476	0.487	0.549	0.392	0.563
B-2	–	SynCW	CW	0.702	0.825	0.327	0.490	0.487	0.629	0.331	0.497
C	ImageNet	–	CW	0.716	0.835	0.499	0.704	0.626	0.770	0.435	0.607
D-1	ImageNet	SynCW^∗	CW	0.714	0.833	0.572	0.735	0.680	0.800	0.649	0.787
D-2	ImageNet	SynCW	CW	0.718	0.835	0.580	0.744	0.684	0.812	0.631	0.774
E	ImageNet	–	CW^∗∗	0.724	0.840	0.317	0.458	0.034	0.066	0.044	0.084

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mvkolos/siamese-change-detection
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

11institutetext: Skolkovo Institute of Science and Technology, 1ADASE and 2Aeronet groups

{maria.kolos,e.burnaev,a.artemov}@skoltech.ru,[email protected]

Procedural Synthesis of Remote Sensing Images

for Robust Change Detection with Neural Networks

Maria Kolos2

Anton Marin2

Alexey Artemov1 and Evgeny Burnaev1 The work was supported by The Ministry of Education and Science of Russian Federation, grant No. 14.615.21.0004, grant code: RFMEFI61518X0004.

Abstract

Data-driven methods such as convolutional neural networks (CNNs) are known to deliver state-of-the-art performance on image recognition tasks when the training data are abundant. However, in some instances, such as change detection in remote sensing images, annotated data cannot be obtained in sufficient quantities. In this work, we propose a simple and efficient method for creating realistic targeted synthetic datasets in the remote sensing domain, leveraging the opportunities offered by game development engines. We provide a description of the pipeline for procedural geometry generation and rendering as well as an evaluation of the efficiency of produced datasets in a change detection scenario. Our evaluations demonstrate that our pipeline helps to improve the performance and convergence of deep learning models when the amount of real-world data is severely limited.

Keywords:

Remote Sensing, Deep Learning, Synthetic Imagery

1 Introduction

Remote sensing data is utilized in a broad range of industrial applications including emergency mapping, deforestation, and wildfire monitoring, detection of illegal construction and urban growth tracking. Processing large volumes of the remote sensing imagery along with handling its high variability (e.g., diverse weather/lighting conditions, imaging equipment) provides a strong motivation for developing automated and robust approaches to reduce labor costs.

Recently, data-driven approaches such as deep convolutional neural networks (CNNs) have seen impressive progress on a number of vision tasks, including semantic segmentation, object detection, and change detection [22, 21, 17, 34, 45, 44, 7]. Such methods offer promising tools for remote sensing applications as they can achieve high performance by leveraging the diversity of the available imagery [11, 19, 41]. However, in order to successfully operate, most data-driven methods require large amounts of high-quality annotated data [28, 41, 29, 24]. Obtaining such data in the context of remote sensing poses a significant challenge, as (1) aerial imagery data are expensive, (2) collection of raw data with satisfactory coverage and diversity is laborious, costly and error-prone, as is (3) manual image annotation; (4) moreover, in some instances such as change detection, the cost of collecting a representative number of rarely occurring cases can be prohibitively high. Unsurprisingly, despite public real-world annotated remote sensing datasets exist [48, 8, 27, 19, 38], these challenges have kept them limited in size, compared to general-purpose vision datasets such as the ImageNet [23].

The alternatives considered to avoid dataset collection issues suggest producing synthetic annotated images with the aid of game development software such as Unity [4], Unreal Engine 4 [5], and CRYENGINE [1]. This approach has been demonstrated to improve the performance of computer vision algorithms in some instances [20, 46, 28, 49, 24, 50, 54]. Its attractive benefits include (1) flexibility in scene composition, addressing class imbalance issue, (2) pixel-level precise automated annotation, and (3) the possibility to apply transfer learning techniques for subsequent “fine-tuning” on real data. However, little work has been done in the direction of using game engines to produce synthetic datasets in the remote sensing domain. Executing procedural changes on large-scale urban scenes is computationally demanding and requires smart optimization of rendering or the object-level reduction (number of polygons, textures quality). Levels of realism rely heavily on the amount of labor on scene design and optimization. Thus, research on the procedural construction of synthetic datasets would contribute to the wider adoption of data-driven methods in the remote sensing domain.

In this work, we focus on the task of change detection, however, it is straightforward to adapt our method to other tasks such as semantic segmentation. We leverage game development tools to implement a semi-automated pipeline for procedural generation of realistic synthetic data. Our approach uses publicly available cartographic data and produces realistic 3D scenes of real territory (e.g., relief, buildings). These scenes are rendered using Unity engine to produce high-resolution synthetic RGB images. Taking advantage of real cartographic data and emulation of image acquisition conditions, we create a large and diverse dataset with a low simulated-to-real shift, which allows us to efficiently apply deep learning methods. We validate our data generation pipeline on the change detection task using a state-of-the-art deep CNN. We observe consistent improvements in performance and convergence of our models on this task with our synthetic data, compared to when using (scarce) real-world data only.

In summary, our contributions in this work are:

•

We describe a semi-automatic pipeline for procedural generation of realistic synthetic images for change detection in the domain of remote sensing.

•

We demonstrate the benefits of large volumes of targeted synthetic images for generalization ability of CNN-based change detection models using extensive experiments and a real-world evaluation dataset.

The rest of this paper is organized as follows. In Section 2, we review prior work on change detection in the remote sensing domain, including existing image datasets, data generation tools, computational models, and transfer learning techniques. Section 3 presents our data generation pipeline. Section 4 poses three experiments investigating the possible benefits of our approach and presents their results. We conclude with a discussion of our results in Section 5.

2 Related work

2.1 Computational models for change detection

Change detection in multi-temporal remote sensing images has attracted considerable interest in the research community, where approaches have been proposed involving anomaly detection on time series and spectral indices [18], Markov Random Fields and global optimization on graphs [58, 65, 32], object-based segmentation followed by changes classification [40, 36, 60], and Multivariate Alteration Detection [61, 39] (cf. [57] for a broader review). These approaches generally work with low-resolution imagery (e.g., 250–500 m/pixel) and require manual tuning of dozens of hyperparameters to handle variations in data such as sensor model, seasonal variations, image resolution, and calibration.

Recently, deep learning and CNNs have been extensively studied for classification, segmentation, and object detection in remote sensing images [66, 13]. However, only a handful of CNN-based change detection approaches exist. Due to the lack of training data [26, 53] use ImageNet pre-trained models to extract deep features and use super-pixel segmentation algorithms to perform change detection. We only study the influence of pre-training on ImageNet in one of the experiments; otherwise, we train our deep CNN from scratch using our synthetic dataset. [25, 27] proposed a CNN-based method for binary classification of changes given a pair of two high-resolution satellite images. In contrast, we focus on predicting a dense mask of changes from the two registered images. The closest to our work are [62, 11], which predict pixel-level mask of changes from the two given images; additionally, [11] uses a U-Net-like architecture as we do. Nevertheless, their models are different from ours, which is inspired by [17].

Due to the scarcity of the available data in some instances, transfer learning techniques have been extensively investigated in many image analysis tasks, including image classification [64, 63], similarity ranking [67] and retrieval [55, 10]. Additionally, transfer from models pre-trained on RGB images to a more specialized domain, such as magnetic resonance images or multi-spectral satellite images, has been studied for automated medical image diagnostics [59, 33, 43] and remote sensing image segmentation [37]. In the context of the present work, of particular interest is the transfer learning from synthetic data to real-world data, that has proven effective for a wide range of tasks [20, 46, 28, 49, 24, 50, 54]. In the remote sensing domain, however, synthetic data has been only employed in the context of semantic segmentation [41]. However, their data generation method relies on pre-created scene geometry, while our system generates geometry based on the requested map data.

2.2 Image datasets for change detection

2.2.1 Real-world datasets.

Datasets for change detection are commonly structured in pairs of registered images of the same territory, made in distinct moments in time, accompanied by image masks per each of the annotated changes. With the primary application being emergency mapping, most datasets typically feature binary masks annotating damaged structures across the mapped areas [27, 48, 12, 8]. L’Aquila 2009 earthquake dataset [8] contains data spanning $1.5\times 1.5$ km2 annotated with masks of damaged buildings during the 2009 earthquake. California wildfires [48] contains $2.5\times 2.5$ km2 and $5\times 8$ km2 images, representing changes after a 2017 wildfire in California, annotated with masks of burnt buildings. ABCD dataset [27] is composed of patches for 66 km2 of tsunami-affected areas, built to identify whether buildings have been washed away by the tsunami. Besides, OSCD dataset [19] addresses the issue of detecting changes between satellite images from different dates, containing 24 pairs of images from different locations, annotated with pixel-level masks of new buildings and roads. All these datasets are low in diversity and volume, while only providing annotations for a limited class of changes (e.g., urban changes). In contrast, our pipeline can produce massive amounts of highly diverse synthetic images with flexible annotation, specified by the user. Other known datasets, such as the Landsat ETM/TM datasets [31], are of higher volume, but have an extremely low spatial resolution (on the order of 10 m), while featuring no annotation.

2.2.2 Synthetic datasets and visual modelling tools.

To the best of our knowledge, AICD dataset [12] is the only published synthetic dataset on change detection in remote sensing domain. It consists of 1000 pairs of $800\times 600$ images. It is a synthetic dataset in which the images are generated using a realistic rendering engine of a computer game, with ground truth generated automatically. The drawbacks of this dataset are low diversity in target and environmental changes and low graphics quality.

Despite tools for creating synthetic datasets are actively developed and studied in the domain of computer vision [4, 49, 5], research on the targeted generation of synthetic datasets for remote sensing applications is still in its infancy. Modern urban modeling packages require manual creation of assets and laborious tuning of rendering parameters (e.g., lighting) [2, 3, 6]. This could be improved by leveraging extensive opportunities offered by game development engines, that combine off-the-shelf professional rendering presets, realistic shaders, and rich scripting engines for fine customization. In our pipeline, procedural generation of geometry and textures is followed by a rendering script, leveraging rich rendering opportunities. Other tools such as DIRSIG [30] allow simulating realistic multi-spectral renders of the scenes but rely on the existing geometry. Our pipeline, in contrast, enables us to create both geometry-based on real-world map data and realistic renders using a game engine.

3 Synthesis of territory-specific remote sensing images

3.1 Data requirements and the design choices of our pipeline

A good change detection dataset should contain application-specific target changes, such as, e.g., deforestation or illegal construction, as well as high variability, which can be viewed as non-target changes. Such variability in data commonly involves appearance changes, e.g., lighting and viewpoint variations, diverse directions of shadows, and random changes of scene objects. However, implementing an exhaustive list of non-target changes is too laborious. Thus, we restricted ourselves to the following general requirements to the synthetic data:

•

Visual scene similarity. To reduce the omnipresent simulated to real shift, it is necessary that the modeled scenes have a high visual resemblance to the actual scenes. We approach the target territory modeling task by imitating the visual appearance of structures and environment.

•

Scale. To match the largest known datasets in spatial scale, we have chosen to model scenes with large spatial size. In our dataset, scenes are generated with spatial extents of square kilometers, a spatial resolution of less than 1 m, and image resolution of tens of Megapixels.

•

Target changes. We have chosen to only model damaged buildings as target class as they are of general interest in applications such as emergency mapping (see, e.g., [40, 39, 57, 53, 26, 62, 11, 25, 60, 36]). They are also straightforward to implement in our pipeline with procedural geometry generation and rendering scripts.

•

Viewpoint variations. Real-world multi-temporal remote sensing images for the same territory are commonly acquired using varying devices (e.g., devices with different field of view) and viewpoints. The acquisition is commonly performed at angles not exceeding 25 deg., and the data are then post-processed by registration and geometric correction. In our pipeline, we imitate the precession of a real satellite by randomly changing the image acquisition angle and the field of view.

•

Scene lighting changes. Scene illumination is commonly considered to consist of a point-source illumination (produced by the Sun) and an ambient illumination due to the atmospheric scattering of the solar rays. In our work, we consider the changes in the Sun’s declination angle and model both components of the illumination.

•

Shadows. Real-world objects cast shadows that are irrelevant variations and should be ignored. We model realistic shadows again by varying the Sun’s declination angle.

To meet these requirements, we have developed a two-stage pipeline consisting of geometry generation and rendering steps. The entire routine is semi-automatic and involves two widely used 3D engines. Specifically, we use Esri CityEngine [2] to procedurally build geometry from real-world map data and Unity [4] to implement the logic behind dataset requirements and leverage rendering capabilities. The reasons behind our choice of CityEngine as our geometry manipulation tool are its flexibility in the procedural geometric modeling and built-in UV/texturing capabilities. Other tools, commonly implemented as plugins for Unreal Engine 4 (e.g., StreetMap111https://github.com/ue4plugins/StreetMap) and Unity (e.g., Mapbox Unity SDK222https://www.mapbox.com/unity), offered significantly less freedom. In these tools, either non-textured or textured but oversimplified shapes (e.g., as simple as boxes) are the only objects available for urban geometry generation.

Unity game engine was selected to execute a generation procedure of change detection dataset. Compared to CAD software often used for the production of datasets, game engines offer advantages such as powerful lighting/shadows out of the box and scripting possibilities. Additionally, Unity allows implementing target changes, controlling lightning and viewpoint changes, and adjusting change rate in the dataset. Certain features in Unity are more suited to our needs, compared to Unreal Engine 4. For instance, we have found shadows in Unity to be more stable while rendering scenes from large distances, and the Layers feature333https://docs.unity3d.com/Manual/Layers.html to add more flexibility by allowing to exclude objects from rendering or post-processing. It is natural, however, that we had to execute some initial settings of Unity before the generation, as the typical requirements of the remote sensing domain differ from those of 3D games. We describe these settings in Section 4.

3.2 Geometry generation

To procedurally generate geometry in CityEngine, we obtain cartographic data (vector layers) from OpenStreetMap and elevation data (a geo-referenced GeoTIFF image) from Esri World Elevation. Vector data contains information about the geometry of buildings and roads along with semantics of land cover (forests, parks, etc.), while elevation data is used to reconstruct terrain.

First, we reconstruct terrain using the built-in functionality in CityEngine, obtaining a textured 3D terrain mesh (see Figure 2), to which we $z$ -align flat vector objects; second, we run our geometry generation procedure implemented in the engine’s rule-based scripting language; last, we apply textures to the generated meshes. Our implementation of the geometry generation involves extruding the polygon of a certain height and selecting a randomly textured roof (architectural patterns such as rooftop shapes are built into CityEngine), see Figure 2. We have experimented with more complex geometry produced by operations such as polygon splitting and repeating; however, such operations (e.g., splitting) increase the number of polygons significantly without adding much to the scene detail, render quality, or performance of change detection models. Focusing on our scalability and flexibility requirements, we avoid overloading our scenes with objects of redundant geometry.

We select two types of buildings to construct our scenes: small buildings with colored gable roofs and concrete industrial-looking structures with flat roofs, see Figure 4. A set of roof textures has been selected manually, CityEngine built-in packs of textures for OpenStreetMap buildings were used for facades. We approach the emergency mapping use-case by texturing buildings footprints to imitate damaged appearance, see Figure 4.

3.3 Rendering

We construct the synthetic dataset by rendering the generated geometry using built-in functionality in Unity, obtaining high-definition RGB images.

To achieve a high degree of variations, we leverage rich scripting capabilities in Unity, that allow flexible scene manipulation via scripts written in C#. As shadow casting and lighting algorithms are built in, we only adjust their parameters, as indicated in Section 4.1. We implement target changes by randomly selecting object meshes and placing them onto the separate layer: the camera will not render these meshes, rendering corresponding damaged footprints instead. Non-target variations are further added to scenes modified by target changes, by random alterations of lighting and camera parameters, resulting in $m$ different image acquisition conditions per each target change.

Each element in the dataset was obtained after three rendering runs: first, we render the original scene; second, we apply both target and non-target changes, moving the changed objects in the separate layer; finally, only the layer with the changed objects is rendered to obtain annotations.

4 Experiments

We demonstrate the effectiveness of our pipeline in an emergency mapping scenario, where the goal is to perform a rapid localization and assessment of incurred damage with extremely limited amounts of annotated data [48]. To this end, we produce a synthetic training dataset in Section 4.1 and design a series of experiments to investigate the influence of training strategy on the change detection performance with different data volumes in Sections 4.4–4.5.

4.1 Datasets

4.1.1 California wildfires (CW).

The dataset contains high-resolution satellite images depicting cases of wildfires in two areas of Ventura and Santa Rosa counties, California, USA. The annotation has been created manually [48].

4.1.2 Synthetic California wildfires (SynCW) dataset.

Using our pipeline, we created a synthetic training dataset for the California wildfires case study. We collected OpenStreetMap and ESRI world elevation data from the area of interest in Ventura and Santa Rosa counties, California, USA. Two major meta-classes imported from OpenStreetMap data were building (including apartments, garage, house, industrial, residential, retail, school, and warehouse) and highway (road structures including footway, residential, secondary, service, and tertiary). We generate geometry using our pipeline configured according to Table 6. We add target changes by randomly selecting 30–50% of buildings geometry. To introduce non-target changes in the dataset, we select $m=5$ different points of view, positioning camera at zenith and at four other locations determined by an inclination angle $\alpha$ from an axis pointing to zenith (we select $\alpha$ uniformly at random from $[5^{\circ},10^{\circ}]$ ), and orienting it to the center of the scene. To model daylight variations, we select Sun’s declination angle uniformly at random from $[30^{\circ},140^{\circ}]$ . In the resulting dataset, the generated scenes of 4 distinct locations have spatial extents of approximately $2\times 2$ km2 with spatial resolution of 0.6 m and image resolution of $3072\times 3072$ px.

4.2 Our change detection model and training procedure

When designing our change detection architecture, we take inspiration from recent progress in semantic segmentation [51, 17, 37]. Our architecture is a Siamese U-Net [51] with residual units [35] in encoder blocks and upsampling units in decoder blocks, which can be viewed as a Siamese version of a segmentation model from [17] (see Figure 7). While our model is composed of well-known building blocks, to the best of our knowledge, we are the first to study its Siamese version in a change detection setting. While selecting the top-performing architecture is beyond the scope of this paper, we have found our architecture to consistently outperform those examined previously [51, 37] in all settings we have considered.

To study the benefits of pre-training on a large image dataset, we have kept the encoder architecture a replica of ResNet-34 architecture [35]. In all settings, the models were trained using $352\times 352$ patches for 20 epochs. Adam optimizer [42] was used with a batch size of 8 and initial learning rate of $10^{-4}$ .

4.3 Metrics

To evaluate our models, we chose two performance measures standard for segmentation tasks: Intersection over Union (IoU) and F1-measure, both obtained by applying the threshold 0.5 to the confidence output. Given a pair of binary masks, IoU can be interpreted as a pixel-wise metric that corresponds to localization accuracy between these two samples, $\text{IoU}(A,B)=\frac{\left|A\cap B\right|}{\left|A\cup B\right|}=\frac{\left|A\cap B\right|}{\left|A\right|+\left|B\right|-\left|A\cap B\right|}$ . F1-measure is the harmonic mean of precision and recall values between the predicted and ground truth masks: $\text{F1}=2\cdot(\text{Precision}^{-1}+\text{Recall}^{-1}).$

4.4 The evaluation setup

When training a deep learning-based change detection model, an annotated real-world remote sensing dataset (e.g., CW [48]), would be a natural choice; however, its volume does not allow training an architecture such as ours from scratch (i.e., starting from randomly-initialized weights). A stronger initialization is commonly obtained with models pre-trained on ImageNet [52], a large-scale and real-world dataset. Unfortunately, ImageNet contains images from a completely different domain; thus, it is unclear whether features trained on ImageNet would generalize well for the change detection scenario. Furthermore, the decoder cannot be initialized and must still be trained. In our setting, SynCW, which is a target-domain large-scale dataset with change annotations, could be employed, but would training on synthetic images lead to good generalization? Thus, there exists no definitive choice of a training data source (cf. Table 6); as we demonstrate further, the choice of training strategy is crucial for achieving high performance.

We design seven training strategies for our task, summarized in Table 1. Strategy A would be a standard setting with excessive amounts of data. In strategies B-1 and B-2, we attempt to model the synthetic-to-real transfer scenario. During pre-training, we either randomly initialize and freeze the encoder (i.e., set its learning rate to zero, B-1) or train it (B-2). Strategies C, D-1, D-2, and E all initialize the encoder with ImageNet-pretrained weights, a widely used initialization: C realizes a common transfer learning setting, D-1 and D-2 proceed in two fine-tuning stages and use the synthetic, then the target training set, either training decoder only (D-1, similarly to B-1) or the entire model (D-2). E is a common transfer learning setting widely used in, e.g., Kaggle444https://www.kaggle.com competitions, leveraging strong augmentations (e.g., rotations, flips, brightness changes, blur, and noise).

Following [48], we use a pair of Ventura train images ( $4573\times 4418$ px) for training or fine-tuning our models. As our goal is to study the effect of decreasing volumes of real-world data, we crop a random patch from these images, setting the ratio of patch area to the full image area to be 1, 1/2, 1/4, 1/8, and 1/16. A non-overlapping pair of Ventura test images ( $1044\times 1313$ px) and a pair of visually distinct Santa Rosa images ( $2148\times 2160$ px) are held out for testing. Note that when testing on Santa Rosa, we do not fine-tune on the same data to test generalization ability. In all experiments, we preserve the same architectural and training details as described in Section 4.2. We release the code used to implement and test our models555https://github.com/mvkolos/siamese-change-detection.

4.5 Results

We present the statistical results of our evaluation in Table 1. As expected, when training data is present in large volumes (e.g., using augmentations in strategy E), models pre-trained on ImageNet perform well. However, when the volume of real-world data supplied for fine-tuning decreases (up to 1//16), such strategies lead to unpredictable results (e.g., for strategy E on Ventura test images IoU measure drops by a factor of 2.3 from 0.724 to 0.317). In contrast, fine-tuning using our synthetic images helps to retain a significant part of the efficiency and leads to a more predictable change in the quality of the resulting model (e.g., in strategy D-1 we observe a decrease in IoU by 20% only from 0.714 to 0.572). We note how for Santa Rosa images, the performance of models trained without synthetic data degrades severely, while fine-tuning using our synthetic images helps to suffer almost no drop in performance. We plot IoU/F1 vs. volume of used data in Figure 11 to visualize this. We also display qualitative change-detection results in Figure 9. Note how the output change masks tend to become noisy for strategies A, C, and E, and less so for D-1 and D-2. Our synthetic data also leads to faster convergence (see Figure 11).

5 Conclusion

We have developed a pipeline for producing realistic synthetic data for the remote sensing domain. Using our pipeline, we have modeled the emergency mapping scenario and created 3D scenes and change detection image datasets of two real-world areas in California, USA. Results of the evaluation of deep learning models trained on our synthetic datasets indicate that synthetic data can be efficiently used to improve performance and robustness of data-driven models in real-world resource-poor remote sensing applications. We could further increase overall computational efficiency thanks to sparse CNNs [47], detection accuracy by using approaches to utilizing multi-modal data [14], imbalanced classification [56, 15] and a loss, tailored for change detection in sequences of events [16, 9].

Bibliography67

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Cryengine. https://www.cryengine.com , accessed: 2019-01-30
2[2] Esri cityengine. https://www.esri.com/en-us/arcgis/products/esri-cityengine/overview , accessed: 2019-01-30
3[3] Osm 2xp. https://wiki.openstreetmap.org/wiki/Osm 2xp , accessed: 2019-01-30
4[4] Unity. https://unity 3d.com , accessed: 2019-01-30
5[5] Unreal engine 4. https://www.unrealengine.com/en-US/what-is-unreal-engine-4 , accessed: 2019-01-30
6[6] World machine. http://www.world-machine.com/ , accessed: 2019-01-30
7[7] Alcantarilla, P.F., Stent, S., Ros, G., Arroyo, R., Gherardi, R.: Street-view change detection with deconvolutional networks. Autonomous Robots 42 (7), 1301–1322 (2018)
8[8] Anniballe, R., Noto, F., Scalia, T., Bignami, C., Stramondo, S., Chini, M., Pierdicca, N.: Earthquake damage mapping: An overall assessment of ground surveys and vhr image change detection after l’aquila 2009 earthquake. Remote Sensing of Environment 210 , 166–178 (2018)