Synthetic GPR datasets to evaluate hybridization inverse approach for pavement tack coat characterization—Geometrical and physical parametric study
Grégory Andreoli, Amine Ihamouten, Xavier Dérobert

TL;DR
This paper creates synthetic GPR datasets to test a new method for analyzing pavement layers using machine learning and waveform inversion.
Contribution
A novel hybrid approach combining machine learning and full-waveform inversion is proposed for pavement tack coat characterization.
Findings
A database was created using gprMax for various pavement structures and frequencies.
The method aims to improve characterization of thin subsurface layers despite interference.
The study focuses on geometrical and physical variations in pavement layers.
Abstract
The visible surface degradation of pavements is often the result of underlying subsurface defects. The quality of the bond between the wearing course and the binder course is a key factor in limiting issues such as delamination, stripping, etc. The use of non-destructive testing (NDT) methods, such as electromagnetic wave propagation combined with a hybrid data processing approach—machine learning and Full-Waveform Inversion—has recently demonstrated its effectiveness [1]. To support this research, a database was created using the open-source software gprMax [2] for various central frequencies on different two-layered pavement structures, modeled in accordance with current French standards. Variations in geometrical and physical parameters were applied to the wearing course, tack coat, and binder course. The objective is to build a representative database for numerical validation of an…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeophysical Methods and Applications · Soil Moisture and Remote Sensing · Microwave Imaging and Scattering Analysis
Specifications TableSubjectEngineering & Materials scienceSpecific subject areaNumerical Tack coat Characterization with radar methods.Type of dataText (.in), HDF5 (.out): gprMAx raw formatTable (*.csv), Eletrical field following Z and direction (in V/m) and magnetic field following X direction (A/m).lData collectionThis database is created using the open-source software gprMax for four various central frequencies on different two-layered pavement structures, modeled in accordance with current French standards. Variations in geometrical and physical parameters were applied to each layer to have many realistic configurations.Data source locationSynthetic dataset generate with the open source software gprMax v.3.1.5:https://www.gprmax.com/The software simulate electromagnetic wave propagation using Yee’s algorithm to solve Maxwell’s equations applying Finite Difference Time-Domain (FDTD).Data accessibilityRepository name: Dataverse: recherche.data.gouv.frData identification number: 10.57745/IWJJATDirect URL to data: 10.57745/IWJJATRelated research article•The dataset is relative to the article DOI: 10.1016/j.treng.2025.100313·Named: Hybridization of machine learning/GPR Full-Waveform Inversion for pavement tack coat characterization: Numerical validation .
Value of the Data
1
- •The generated datasets consist of synthetic time-domain signals (A-scans) simulating various configurations in terms of thickness and dielectric permittivity of the wearing and binder course in pavement structures. At the interface, a thin layer, representing the tack coat, is also modeled, with thicknesses on the order of millimeters and varying dielectric properties. All geometric and physical parameters are provided for each A-scan. By using the surface layer's dielectric permittivity as a priori input for a Support Vector Machine (SVM) machine learning algorithm, both performance and computation time are improved.
- •The generation of representative synthetic data for various bonding conditions on structures with variable geometries is highly time-consuming. These data are essential for building high-performance machine learning models;
- •In addition to output files in both raw format and tabular form, open access to input files facilitates understanding their creation and enables modifications to expand the database;
- •These synthetic databases, generated using gprMax, are made available to the entire scientific and professional pavement community with a focus on GPR applications. They can be useful for the detection and characterization of tack coat interface located between the wearing course and the binder course in accordance with current French standards.
Background
2
The purpose of these datasets is to validate the inverse methods allowing characterization of the tack coat in the subsurface using simulated radar data generated with the Finite-Difference Time-Domain (FDTD) modeling tool, gprMax [2]. By inverting the time-domain signals, the objective is to extract quantitative information about the surveyed materials, particularly in terms of layer thickness and electromagnetic properties in two dimensions. To achieve this, a representative database has been generated, structured and processed, exploring a wide range of scenarios and realistic configurations in accordance with current standards while maintaining control over various parameters and degrees of freedom. Among the key variables, variations in the central frequency (900 MHz, 2.6 GHz, 4.5 GHz, and 6 GHz) have been used, along with changes in the thicknesses and dielectric permittivities of the wearing course, tack coat, and binder course. These variations fall within realistic ranges based on existing standards and the components of the asphalt mixtures used, including aggregate type, binder, and compaction levels [3].
Data Description
3
The databases consist of raw A-scans data collected for all the configurations described in the previous section using gprMax FDTD model. The first-level folders are dedicated to each center frequency: 900 MHz, 2.6 GHz, 4.5 GHz, and 6 GHz. The obtained signals can also be filtered by subtracting the wave emitted in air. For this, the Ricker folder contains the wavelets propagated in air for each center frequency. Each subfolder indicating the frequency in MHz at the beginning. The data structure tree is described as follows (example for 2.6 GHz):
- •Synthetic_GPR_Database
- •/2600MHz
- •/2600_Database_BBSG_TC_2-3_GB
- •2600_gprMax_Database_BBSG_TC_2-3_GB.zip
- •“gprMax input file”.in
- •“gprMax output file”.out
- •“gprMax format to table conversion”.csv
- •/2600_Information_BBSG_TC_2-3_GB.csv
- •/2600_Distribution_BBSG_TC_2-3_GB.fig
- •/2600_Distribution_BBSG_TC_2-3_GB.tif
- •/2600_Ascans_BBSG_TC_2-3_GB.fig
- •/2600_Ascans_BBSG_TC_2-3_GB.tif
- •/2600_Database_BBSG_TC_6-7_GB
- •/2600_gprMax_Database_ BBSG_TC_6-7_GB.zip
- •“gprMax input file”.in
- •“gprMax output file”.out
- •“gprMax format to table conversion”.csv
- •/2600_Information_BBSG_TC_6-7_GB.csv
- •/2600_Distribution_BBSG_TC_6-7_GB.fig
- •/2600_Distribution_BBSG_TC_6-7_GB.tif
- •/2600_Ascans_ BBSG_TC_6-7_GB.fig
- •/2600_Ascans_ BBSG_TC_6-7_GB.tif
- •/2600_Database_BBTM_TC_2-3_BBSG
- •/2600_gprMax_Database_ BBTM_TC_2-3_BBSG.zip
- •“gprMax input file”.in
- •“gprMax output file”.out
- •“gprMax format to table conversion”.csv
- •/2600_Information_BBTM_TC_2-3_BBSG.csv
- •/2600_Distribution_BBTM_TC_2-3_BBSG.fig
- •/2600_Distribution_BBTM_TC_2-3_BBSG.tif
- •/2600_Ascans_BBTM_TC_2-3_BBSG.fig
- •/2600_Ascans_BBTM_TC_2-3_BBSG.tif
- •/2600_Database_BBTM_TC_6-7_BBSG
- •/2600_gprMax_Database_BBTM_TC_6-7_BBSG.zip
- •“gprMax input file”.in
- •“gprMax output file”.out
- •“gprMax format to table conversion”.csv
- •/2600_Information_BBTM_TC_6-7_BBSG.csv
- •/2600_Distribution_BBTM_TC_6-7_BBSG.fig
- •/2600_Distribution_BBTM_TC_6-7_BBSG.tif
- •/2600_Ascans_BBTM_TC_6-7_BBSG.fig
- •/2600_Ascans_BBTM_TC_6-7_BBSG.tif
- •/Ricker (containing the Ricker wavelet propagated in air for each frequency in the database)
- •“gprMax input file”.in
- •“gprMax output file”.out
- •“gprMax format to table conversion”.csv
- •900_Ricker.fig
- •900_Ricker.tif
- •2600_Ricker.fig
- •2600_Ricker.tif
- •4500_Ricker.fig
- •4500_Ricker.tif
- •6000_Ricker.fig
- •6000_Ricker.tif
At the second-level, the folders are organized based on the type of modeled two-layered structures according to the nomenclature (Fig. 1):Fig. 1. Example folder description.Fig 1:
At the third-level structure, there is:
- •A compressed folder containing all the simulated files. For each of them, there is a *.in text file representing the input parameters required for modeling with gprMax, a *.out HDF5 file corresponding to native output data file, and finally a *.csv file, which is the conversion of the *.out file into a directly usable format. Each file follows the naming convention in Fig. 2.Fig. 2. Example file description.Fig 2:The *.csv file derived from *.out has three columns: time, electric field following Z direction (E_z_ in V/m) and magnetic field following X direction (H_x_ in A/m).
- •A *.csv file beginning by XXXX_Information (where XXXX represents the frequency in MHz). This file includes a table with all the information for each A-scan. Each row is structured as follows:
- •**.fig* and .tif files beginning by XXXX_Distribution. show the distribution in thicknesses and permittivities of each layers according to a normal distribution and *.fig and .tif files beginning by XXXX_Ascans. show an example A-scans with tack coat thicknesses.
Experimental Design, Materials and Methods
4
For all modeled cases (with a central frequency equal to 900 MHz, 2.6 GHz, 4.5 GHz, and 6 GHz), a relative conductivity of , a relative permeability of and zero magnetic losses were set, corresponding to most geological materials [4]. The modeled two-layered structures are put on a Perfect Electric Conductor (PEC) modeling approach, enabling the identification of the bottom layer. Table 1Table 1XXXX_Information file description.Table 1
The sources are modeled as dipole antennas with a transmitter-receiver spacing of 50 mm, propagating a Ricker wavelet toward the two-layered structures with an antenna height of 3 mm above the wearing course (Fig. 3). This antenna height falls within the Rayleigh region (near-field) [1], where constructive interference can prevent signal discrimination, particularly when the wearing course is thin.Fig. 3modeled two-layered structures with various geometric and physical parameters.Fig 3:
The time window (The period during which the receiver antenna captures and records echoes from a transmitted electromagnetic pulse) has been chosen depending on the studied frequency bandwidth and the center frequency fc:
- •fc = 900 MHz, Time window = 10 ns, number of points in A-scan = 4241 pts
- •fc = 2.6 GHz, Time window = 3 ns, number of points in A-scan = 1273 pts
- •fc = 4.5 GHz, Time window = 2.6 ns, number of points in A-scan = 1104 pts
- •fc = 6 GHz, Time window = 2.1 ns, number of points in A-scan = 892 pts
Note: To expand the database, it is possible to vary parameters such as the thickness of each layer, their permittivity and conductivity, as well as the spatial resolution, central frequency, and more.
The datasets are referenced according to Table 2.Table 2datasets description.Table 2:
During the application of the tack coat, it has been observed that under conditions of high humidity, rainfall, or inadequate breaking, water droplets may become trapped at the interface. Given that the permittivity of water is significantly higher (compared to 2.4 at fc=2.6 GHz), the overall permittivity at the interface increases with dispersive effect. To simulate this application anomaly, the authors increased the permittivity range of the tack coat from [2,3] to [6,7]. This approach allowed them to demonstrate the relevance of incorporating dielectric permittivity a priori input in a machine learning model, specifically using Support Vector Machine/Support Vector Regression (SVM/SVR) algorithms.
Note: BBTM is French acronym for Very thin Bituminous Concrete, BBSG is French acronym for Semi-Coarse Bituminous Concrete and GB is French acronym for Bituminous Gravel, h_xx_ is the thickness and ε_xx_ the dielectric permittivity.
As part of research efforts focused on characterizing thin pavement subsurface layers, the databases generated in this study enabled a feasibility analysis and validation protocol of hybridization approach combining machine learning methods and Full-Waveform Inversion (FWI) [5]. These investigations showed that, by constructing representative databases and processing the data using machine learning techniques (Support Vector Machine), it is possible to classify the bonding state and estimate its thickness [6]. Furthermore, by incorporating an a priori physical knowledge (such as dielectric permittivity of the wearing course) into the model as an input parameter, the performance initially provided by the machine learning algorithm are optimized and enhanced [1].
To this end, two datasets has been considered from which wave in air (Fig. 4) has been subtracted:
- •The training dataset (Fig. 5). In this example, the dataset consists of BBTM on BBSG with tack coat permittivities varying in the range [2,3] with 4 significant digits at 2.6 GHz,Fig. 5. Thicknesses and permittivities of the wearing course, tack coat, and binder course for a BBTM on BBSG structure, with tack coat permittivities varying in the range [2,3] at 2.6 GHz; a) Example of A-scans with various tack coat thicknesses; b) Random distribution of parameters in the training database.Fig 5:
- •The testing dataset (Fig. 6). In this example, the dataset consists of BBTM on BBSG with tack coat permittivities varying in the range [2,3] with 4 significant digits at 2.6 GHz.Fig. 6. Thicknesses and Permittivities of the wearing course, tack coat, and binder course for a BBTM on BBSG structure, with tack coat permittivities varying in the range [6,7] at 2.6 GHz; a) Example of A-scans with various tack coat thicknesses; b) Random distribution of parameters in the testing database.Fig 6: Fig. 4wave propagated in the air at 2.6 GHz.Fig 4:
Thanks to the software Matab (2021a and higher), a bivariate normal distribution is used to modeled an electromagnetic thickness map of the tack coat in two-layered pavement structures, comprising a BBTM (wearing course) over a BBSG (binder course), at a frequency of 2.6 GHz, corresponding to a commercially available GPR system. The mapped section covered 50 meters in length and 5 meters in width, requiring 4,160 A-scans for testing dataset (Fig. 7). To maintain a 70/30 ratio between the training and testing datasets, 9707 A-scans were selected from the 10,000 randomly generated A-scans within the thickness and permittivity ranges defined for each line of the Table 2.Fig. 7a) Grid of the synthetic pavement structure; b) EM class mapping of the ground truth with various tack coat thicknesses [6].Fig 7:
To evaluate hybridization method developed in this research, we followed the methodology described in the flowchart below (Fig. 8). The raw dataset, consisting of full A-scans without any preprocessing, is referred to as the global approach.Fig. 8. Flowchart illustrating the A–scans processing hybrid method.Fig 8:
In this application, with Matlab, a white Gaussian noise to increase dispersion and evaluate robustness is introduced with SNR (Signal-Noise Ratio) = 20 dB. A windowing process is applied around the region of interest (interface), allowing less relevant parts of the signal to be discarded and consequently reducing computation time. Currently, windowing is performed manually. The authors base it on the global minimum (around 0.8 ns) of the first echo, which results from the combination of the surface echo and the direct wave. An additional 271 points (for BBTM) or 381 points (for BBSG) are added to define the window width. The window used is either the Blackman-Harris or the Tukey window. Additionally, windowing helps reduce the size of input signals by retaining only the region of interest (around the interface), thereby improving the performance of the Support Vector Machine models. Knowing the dielectric permittivity (e.g. using Full-Waveform Inversion as in Fig. 9) of the wearing course, its observable is appended at the end of each A-scan. The newly preprocessed dataset then serves as input to the SVM/SVR machine learning model and related Gaussian kernel method, known as the hybrid model. Using a Bayesian approach, the optimal values of the hyperparameters (kernel, C, γ) are determined, leading to a reduction of the objective function. By examining the resulting confusion matrices (Fig. 10) and electromagnetic maps (Fig. 11), the impact of a hybrid data processing approach on both performance metrics and cartographic representation can be observed. The full details of the methodology and choices are provided in the reference article [1].Fig. 9. Flowchart describing GPR data processing using FWI to extract the wearing course permittivity for hybridization [1].Fig 9:Fig. 10. Confusion matrices of the tested database (SNR = 20 dB): a) Two-Class SVM, global approach, without hybridization; b) Two-Class SVM with hybridization; c) Multi-Class SVM, global approach, without hybridization; d) Multi-Class SVM with hybridization.Fig 10:Fig. 11. Results of inverse radar data (SNR = 20 dB); b) Two-Class SVM, global approach, without hybridization; c) Two-Class SVM with hybridization; d) Multi-Class SVM, global approach, without hybridization; e) Multi-Class SVM with hybridization; f) SVR, global approach, without hybridization; g) SVR with hybridization.Fig 11:
By examining the confusion matrices corresponding to the datasets shown in Fig. 5, Fig. 6, an improvement in true positive classification can be observed. For the Two-Class SVM, 3078 A-scans were correctly classified using the hybrid model (Fig. 10a), compared to only 964 with the global approach (Fig. 10b). A similar improvement is seen with the Multi-Class SVM method, with 1748 A-scans correctly classified using the hybrid model (Fig. 10c), versus 653 with the global approach (Fig. 10d).
By comparing the different methods and approaches (Table 3), the performance improvement from the global approach to the hybrid model can be observed, thanks to the windowing technique combined with the integration of the wearing course permittivity.Table 3. Performance and methods with different approach (SNR = 20 dB).Table 3:MethodApproachRecallPrecisionF1-ScoreRMSER²Two-Class SVMGlobal0.520.770.62--Hybrid0.880.930.91--Multi-Class SVMGlobal0.240.330.12--Hybrid0.50.50.43--SVRGlobal---1.9315Hybrid---0.8****72
By mapping the output of the machine learning model for each A-scan, an electromagnetic representation of the tack coat geometry can be generated (Fig. 11). By comparing with the ground truth (Fig. 11a), the classification results for each method (Two-class SVM, Multi-Class SVM, and SVR) can be observed, providing a global approach in contrast to the hybrid model.
The methodology developed around these databases highlights the feasibility of characterizing very thin pavement layers using machine learning techniques. Furthermore, incorporating the physical parameters describing each signal as input features enables a hybrid data processing approach, enhancing algorithm performance across distinct datasets. For further details, we encourage the reader to refer to article [1].
Limitations
This database is generated in 2 dimensions (length and depth) for reasons of computing time. The use of more powerful computing machines could enable the creation of a 3-dimensional database. The use of other antennas could also be envisaged.
Ethics Statement
The present work did not involve the use of human subjects, animal experiments, or data collected from social media platforms.
CRediT Author Statement
Grégory Andreoli: Conceptualization, Methodology, Investigation, write original draft; Amine Ihamouten: Conceptualization, Supervision, Writing and Review & Editing, Validation; Xavier Dérobert: Supervision, Review & Editing, Validation.
Acknowledgments
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Andreoli G.Ihamouten A.Dérobert X.Hybridization of machine learning /GPR full-waveform inversion for pavement tack coat characterization: numerical validation Transport. Eng.2025202510031310.1016/j.treng.2025.100313 · doi ↗
- 2Giannopoulos A.Modelling ground penetrating radar by gprmax Constr. Build. Mater.19102005755762
- 3Araujo S.Delbreilh L.Laguerre L.Dumont H.Dargent E.Fauchard C.Rock permittivity characterization and application of electromagnetic mixing models for density/compactness assessment of HMA by means of step-frequency radar Near Surf. Geophys.1462016551562
- 4Andreoli G.Ihamouten A.Fauchard C.Jaufer R.Todkar S.S.Guilbert D.Buliuk V.Dérobert X.Numerical modeling using gpr Max to identify a subsurface tack coat for SVM classification NSG 2021 2nd Conference on Geophysics for Infrastructure Planning, Monitoring and BIM Bordeaux, France 120211510.3997/2214-4609.202120040 · doi ↗
- 5Andreoli G.Ihamouten A.Buliuk V.Fauchard C.Jaufer R.Dérobert X.Numerical parametric study to classify and estimate pavement characteristics using GPR and machine learning methods 1st Data Science and Pavement Symposium 2022
