Engineering-Oriented Ultrasonic Decoding: An End-to-End Deep Learning Framework for Metal Grain Size Distribution Characterization

Le Dai; Shiyuan Zhou; Yuhan Cheng; Lin Wang; Yuxuan Zhang; Heng Zhi

PMC · DOI:10.3390/s26030958·February 2, 2026

Engineering-Oriented Ultrasonic Decoding: An End-to-End Deep Learning Framework for Metal Grain Size Distribution Characterization

Le Dai, Shiyuan Zhou, Yuhan Cheng, Lin Wang, Yuxuan Zhang, Heng Zhi

PDF

Open Access

TL;DR

A deep learning framework uses ultrasonic data to accurately predict metal grain size distribution, offering a scalable and adaptable solution for non-destructive evaluation.

Contribution

A novel deep learning model with elliptic spatial fusion and transfer learning for ultrasonic-based grain size prediction in GH4099.

Findings

01

The model achieves MAEs of 1.08 μm (mean) and 0.84 μm (standard deviation) in grain size prediction.

02

Transfer learning calibration rapidly restores accuracy under new input conditions.

03

The framework outperforms traditional attenuation- and velocity-based methods.

Abstract

What are the main findings? Multimodal ultrasonic features with time–frequency encoding and an encoder–decoder model, aided by elliptic spatial fusion, enable grain size distribution prediction for GH4099.The method achieves MAEs of 1.08 μm (mean) and 0.84 μm (standard deviation) with a KL divergence of 0.0031, outperforming attenuation- and velocity-based approaches. Multimodal ultrasonic features with time–frequency encoding and an encoder–decoder model, aided by elliptic spatial fusion, enable grain size distribution prediction for GH4099. The method achieves MAEs of 1.08 μm (mean) and 0.84 μm (standard deviation) with a KL divergence of 0.0031, outperforming attenuation- and velocity-based approaches. What are the implications of the main findings? Transfer learning calibration rapidly restores accuracy under new input conditions, improving adaptability for practical ultrasonic…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Chemicals1

Metal

Figures16

Click any figure to enlarge with its caption.

Keywords

ultrasonic characterizationgrain size distributiondeep learningnickel-based superalloytransfer learning

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsUltrasonics and Acoustic Wave Propagation · Machine Learning in Materials Science · Generative Adversarial Networks and Image Synthesis

Full text

1. Introduction

Grain size is a key microstructural parameter governing the properties of metallic materials [1]. It directly influences mechanical behavior (strength, toughness, and ductility) as well as electrical and thermal properties and corrosion resistance [2,3]. Therefore, accurate measurement and control of grain size are crucial for materials science, engineering applications, and industrial production [4].

Traditional techniques such as metallography [5] and electron backscatter diffraction (EBSD) [6] are highly accurate but often time-consuming, inefficient, and destructive [7,8]. Consequently, they do not meet industrial requirements such as rapid grain inspection in metallurgy and precise localization of size defects.

Ultrasonic techniques are widely used for material characterization because ultrasonic waves can penetrate materials and are sensitive to microstructural changes [9]. Their non-destructive nature, high spatial resolution, real-time capability, and operational simplicity address the limitations of conventional methods, making ultrasound an increasingly attractive approach [10]. As ultrasonic waves propagate through a material, microstructural parameters such as grain size, morphology, and orientation affect signal transmission [11,12]. By extracting and analyzing ultrasonic features, grain size can be measured rapidly and non-destructively [13]. Current approaches can be broadly grouped into physics-based models and data-driven machine learning methods.

Physics-based methods include the attenuation method [14], ultrasonic velocity method [15,16], and center-frequency method [17]. These approaches derive grain size from propagation characteristics by analyzing attenuation, phase velocity, and spectral features. Although theoretically accurate, their results depend strongly on model assumptions and material specificity, limiting general applicability across materials and process conditions. Moreover, existing theory cannot fully exploit the rich information in ultrasonic signals, and no general analytical formula is available for grain size distribution characterization [18].

Data-driven approaches learn a mapping between grain size and ultrasonic features to enable prediction [19]. Compared with physics-based methods, such models can transfer across tasks and support multi-task characterization [20]. Liu et al. [21] used multi-level wavelet decomposition and a multi-channel one-dimensional CNN to characterize grain size distributions. Zhang et al. [22] combined laser ultrasonics with random forest regression using longitudinal-wave velocity and multi-frequency attenuation. Yu et al. [23] employed GA-optimized BP networks with multi-frequency attenuation features, and Viana et al. [24] extracted time-series features from backscattered signals to classify grain size in ASTM A36 steel. These studies demonstrate the potential of machine learning for grain size prediction, yet their performance is often specific to material type, specimen geometry, and signal conditions. For instance, specimen thickness strongly affects ultrasonic attenuation and velocity [25]. Physical methods typically treat thickness as prior knowledge, whereas many data-driven studies ignore this factor. In automated inspection, the thickness at the measurement point is generally unknown. Therefore, it should be incorporated into prediction targets and constraints to improve generalization across scenarios. In addition, the impact of varying experimental inputs and corresponding mitigation strategies remains underexplored.

This study proposes a deep learning approach for grain size distribution prediction based on multimodal ultrasonic features with spatial coding. First, signal and physical features relevant to ultrasonic grain measurement are extracted from raw data and represented as time–frequency maps. Next, an encoder–decoder model comprising a dual convolutional compression network and a fully connected network is designed for prediction. An elliptic spatial-fusion expectation strategy is then introduced based on the statistical characteristics of metallographic sections. Finally, the method is validated through comparison with traditional approaches.

To address material dependence, the proposed method introduces thickness prediction within the model to learn the latent relationship between grain size and material properties, thereby removing the need for prior thickness measurements in online applications. To study sensitivity to input conditions, signals were collected using probes at three excitation frequencies. Different dataset combinations were constructed to evaluate input-specific generalization, and a transfer learning strategy was proposed to provide practical calibration for new scenarios.

2. Materials and Methods

2.1. Experiment

The experimental material was a nickel-based superalloy (GH4099) commonly used in aerospace additive manufacturing. Its chemical composition is listed in Table 1.

To obtain ultrasonic signals and corresponding grain size data for each propagation region, we leveraged the fact that ultrasonic acquisition is non-destructive, simple, and low-cost relative to metallographic measurements [26]. The experiment first collected high-spatial-resolution C-scan signals on the sample surface using a customized water-immersion ultrasonic scanning system. The inspection was conducted in pulse-echo mode using a high-frequency focused transducer. The focal point was set at the sample surface to ensure optimal spatial resolution for grain boundary interaction. Several grid points were then selected as centers of metallographic sections, and the metallographic results were used as grain size labels for the nearby ultrasonic signals. The signal acquisition and metallographic procedures are described below.

2.1.1. Ultrasonic Signal Acquisition

The signal acquisition platform was a self-developed ultrasonic C-scan detection system. As shown in Figure 1, it includes a three-degree-of-freedom motion stage, a Panametrics 5900PR pulser/receiver (Olympus Panametrics, Waltham, MA USA), and a Galil DMC-21 × 3 motion controller (Galil Motion Control, Inc., Rocklin, CA, USA). Piezoelectric probes with center frequencies of 5 MHz, 10 MHz, and 20 MHz were used. Notably, while the 20 MHz probe was utilized to enhance sensitivity to smaller grains, the ultrasonic waves in the nickel-based superalloy (GH4099) undergo significant frequency-dependent attenuation and Rayleigh scattering. This physical phenomenon leads to a distinct downward shift in the center frequency for the back-wall echoes, resulting in the effective signal energy being concentrated within a lower frequency band than the nominal probe frequency.

To collect multiple back-wall echoes in a single test and reduce interference from reflections at the tank bottom, a spacer of a parallel material was placed under the specimen so that most of the sample center had a water path above the tank bottom; thus, bottom reflections did not interfere with the effective echoes during a single excitation. The scan step size was 0.5 mm, the acquisition window per pulse was 25 μs, and the sampling rate was 100 MHz.

2.1.2. Material Grain Size Distribution Measurement

As shown in Figure 2a, four test blocks from different processing batches were used: two of size 50 mm × 45 mm × 5 mm and two of size 50 mm × 38 mm × 5 mm. A total of 30 checkpoints were selected for metallography. As shown in Figure 2b, the metallographic observation surface was parallel to the ultrasonic incidence direction, and each metallographic block measured 5 mm × 5 mm × 5 mm. Sample preparation included grinding, polishing, and etching. The surface was sequentially polished with 180#, 600#, 1000#, 1500#, 2000#, and 3000# sandpapers, followed by diamond paste to a mirror finish. A mixed solution of 10% HCl and 2% FeCl_2_ was used for etching, and an optical microscope was used for observation. Figure 2c shows the micrograph of checkpoint #1. Grain size from the micrographs was quantified using Image-Pro-Plus for comparison with the prediction mode. Specifically, the software quantifies grain size through a series of steps: initial image enhancement and thresholding to isolate grain boundaries, followed by automated measurements using the intercept method or planimetric method in accordance with ASTM E112 standards [27] to calculate the average grain diameter and area distribution.

2.1.3. Grain Size Distribution Measurement Results

This study comprehensively explored the relationship between the average size and its distribution and ultrasonic signals. Therefore, according to the assumption of lognormal distribution of grains [28], the mean $[eqn]$ , standard deviation $[eqn]$ , and mean and standard deviation of the log grain size of each region were calculated. The relationships among them are given as follows:

[eqn]

[eqn]

[eqn]

where $[eqn]$ represents the grain size, $[eqn]$ is the mean in the logarithmic normal distribution, $[eqn]$ is the standard deviation in the logarithmic normal distribution, $[eqn]$ is the mean of grain size, and $[eqn]$ is the standard deviation of grain size.

Using these formulas, the statistical parameters of grain size in each region were calculated (Table 2).

The results in Table 2 were visualized to obtain the log grain size distribution histogram (a) and the grain size distribution box plot (b) shown in Figure 3, “cycles” represent possible outliers or deviations. It can be seen from Figure 3a that the grain size of the materials in this study basically satisfies the assumption of log-normal distribution, and the data of different check points have significant distribution differences. The box diagram in Figure 3b further shows the data distribution. It can be seen that the average grain size $[eqn]$ is distributed in the range of 56–79 μm, mainly concentrated in the vicinity of 65 μm. The larger grain measurement blocks in the same region are marked as outliers (represented by bubbles in the figure), and the standard deviation of grain distribution in different regions is larger. This indicates that the research dataset has a certain degree of generalization diversity and can be used for grain size numerical prediction research.

2.2. Model and Method

2.2.1. Data Preprocessing

We employ a deep learning approach for grain size prediction. Unlike traditional machine learning pipelines that emphasize handcrafted feature extraction, deep learning prioritizes information preservation and flexible feature representation (e.g., images or sequences). Theoretically, deep neural networks can approximate arbitrary nonlinear functions [29,30], enabling feature learning directly from raw time-domain signals [31].

However, this conclusion assumes sufficiently large datasets and long training times. In small-sample industrial settings, incorporating prior knowledge through signal preprocessing and model design can compensate for limited data, accelerate convergence, and improve prediction stability [32].

Accordingly, we preprocessed the input signals following three principles: information completeness, noise suppression, and learning adaptability. Effective signal content was preserved, noise was minimized, and features were transformed into representations suitable for deep learning to facilitate efficient feature extraction and pattern recognition.

Figure 4 illustrates the full preprocessing pipeline. The raw signal shows periodic attenuation; the first pulse is the longitudinal wave reflected from the surface, and subsequent pulses are reflections from the back wall after propagation through the material. The pipeline aligned echoes, applied band-pass filtering and discrete wavelet denoising, and then used a continuous wavelet transform to extract time–frequency amplitude and phase features. The steps are detailed below.

First, the signal was aligned by shifting the first back-wall echo to the start of the record and normalizing its amplitude to the maximum value, thereby removing variability in input energy.

Next, denoising was performed. A fourth-order Butterworth band-pass filter (1–20 MHz) was employed to eliminate out-of-band noise. Its upper cutoff frequency of 20 MHz was specifically chosen to preserve the full spectrum of the downshifted 20 MHz probe signals while simultaneously suppressing high-frequency electronic noise, which was subsequently followed by soft-thresholding processing with a db4 wavelet at a threshold value of 0.15. As shown in Figure 5a, components above level 0 were treated as noise: levels 1–2 corresponded to shock/echo noise, while levels 3–4 represented random noise. The reconstructed signal in Figure 5b shows a markedly improved signal-to-noise ratio, indicating effective separation of signal and noise.

Finally, because grain size characterization depends on time–frequency features such as attenuation, center frequency, and ultrasonic velocity, we used a time–frequency map to integrate these features. The short-time Fourier transform (STFT) is common but requires window selection, adapts poorly to varying signals, and provides low resolution for high-frequency components [33].

Wavelet transform provided multi-resolution time–frequency analysis with adaptive resolution, offering better high-frequency resolution. In this study, the complex Gaussian wavelet (cgau8) was selected because its complex-valued nature allows for the simultaneous extraction of amplitude and phase information. Physically, the amplitude channel directly represented the frequency-dependent attenuation of the ultrasonic energy, while the phase channel provided precise information regarding the propagation time and velocity shift. Unlike real-valued wavelets (e.g., Daubechies or Haar) that collapse these features, the dual-channel encoding of cgau8 enabled the CNN to decouple and learn these two dominant physical metrics independently.

[eqn]

where $[eqn]$ is the original signal, $[eqn]$ represents the complex conjugate of the wavelet function, $[eqn]$ is the scale parameter controlling dilation, and $[eqn]$ is the translation parameter controlling position.

The scale parameter satisfies the following relationship:

[eqn]

where $[eqn]$ is the sampling frequency and $[eqn]$ is the wavelet center frequency, determined by the wavelet family and independent of the data itself; 1024 frequency points from 1 to 15 MHz were sampled uniformly. The 1–15 MHz range for continuous wavelet transform (CWT) was chosen because experimental analysis revealed that the signal-to-noise ratio (SNR) for back-wall echoes across all three probes was highest within this interval. Frequencies above 15 MHz for the 20 MHz probe were found to contain predominantly scattering noise and negligible coherent echo energy due to the material’s ‘low-pass’ filtering effect during propagation. The first 1024 time points were then used to form a 2 × 1024 × 1024 wavelet time–frequency diagram with amplitude and phase channels. The resulting map is shown in Figure 6. The amplitude diagram provides good time-domain and frequency-domain resolution and captures the center frequency and attenuation process. This image data can be combined with CNNs for training, and it satisfies the three principles of signal completeness, noise suppression, and learning adaptability.

2.2.2. Dataset Generation

Thirty grain-sample points were measured. Because the metallographic area (5 mm × 5 mm) is larger than the ultrasonic scan grid (0.5 mm step), we expanded the dataset by constructing ellipses centered at each metallographic section: 7 and 5 sampling points (corresponding to 3 and 2 mm, respectively) were used as the major-axis lengths (Figure 7). In total, 19 scan points were collected per metallographic position, yielding 570 signals. To integrate the multi-frequency data, signals from the 5 MHz, 10 MHz, and 20 MHz probes were treated as independent observations within a unified dataset. Each A-scan signal, regardless of the source probe, was transformed into an identical $[eqn]$ time–frequency feature map, allowing the deep learning model to learn consistent cross-frequency mapping relationships between the ultrasonic backscatter/attenuation patterns and the grain size distribution, rather than relying on frequency-specific features. Combined with data from three probe frequencies, 1710 samples were obtained after time–frequency feature extraction. In this study, we assumed that there was no significant change in the grain distribution near a single sampling point and that the overall grain distribution was relatively uniform.

2.2.3. Deep Learning Model for Grain Size Characterization

The proposed network has two modules: ultrasonic signal encoding and material-property prediction. Accordingly, we design a deep learning architecture consisting of a fully convolutional time–frequency encoder and a fully connected decoder. The network structure is described below.

Convolutional neural networks (CNNs) are widely used for image representation learning [34]. They capture local image features and, for time–frequency inputs, can extract center-frequency patterns. However, standard convolutions are spatially shift-invariant [35] and may not capture time-domain causality in ultrasonic signals (e.g., attenuation and velocity), which is critical for grain characterization. To address this, we adopt full spatial encoding inspired by U-Net [36], progressively compressing the pixel space, fusing information across scales, and representing signal features through channels. With full compression, the model attains a global receptive field and captures correlations across time–frequency regions. The resulting encoder architecture is shown in Figure 8.

Image feature extraction uses a double-convolution block that preserves spatial resolution while enriching channel-wise representations. The pixel space is then down-sampled using a pooling layer.

The convolution mapping is given by:

[eqn]

[eqn]

The pooling mapping is given by:

[eqn]

[eqn]

where $[eqn]$ is the pooling window size, $[eqn]$ is the stride, and $[eqn]$ is the padding size.

The convolution kernel size was set to (3, 3) with stride (1, 1), and the pooling kernel to (4, 4) with stride (4, 4), reducing spatial resolution to one quarter. A double-convolution block followed by pooling constitutes one compression-sensing block. Stacking these blocks ultimately compresses the spatial map to a single-feature scalar. The compression ratio per block can be adjusted (typically a power of two). Here it is set to 4 based on input size and sample count to avoid excessive training difficulty or insufficient encoding capacity.

The current architecture compresses the $[eqn]$ time–frequency representation into a single feature scalar to ensure a global receptive field, allowing the model to capture the integral attenuation and scattering characteristics across the entire propagation history. This design effectively mitigates overfitting on the current industrial dataset by focusing on the most dominant physical features. For more complex material organizations with higher-order heterogeneity, the encoding capacity could be further enhanced in two ways: (1) Channel Expansion—Transitioning from a single scalar to a high-dimensional latent vector (e.g., $[eqn]$ or $[eqn]$ ) to preserve multi-scale structural information; and (2) Variational Constraints—Implementing a Variational Autoencoder (VAE) framework to impose statistical constraints (such as KL-divergence) on the encoding layer. This would ensure that the latent space follows a specific distribution, improving the robustness and physical interpretability of the extracted features under extreme microstructural conditions.

2.2.4. Grain Size Characterization Model

Based on the encoded features, we design a fully connected network to represent material properties. This stage captures the mapping from ultrasonic signals to material features and the correlations among features in the latent space. Incorporating physically meaningful constraints can improve generalization and convergence.

In this study, the direct prediction targets are the mean log grain size and its variance. Physical models indicate that ultrasonic grain characterization depends on material type and local thickness. Representing all material features is complex; therefore, we focus on thickness as a key factor. Because thickness is unknown during automated inspection, it should not be treated as prior knowledge but learned implicitly within the model.

Thickness can be modeled through network design, loss design, or prediction targets. Regardless of approach, thickness implicitly captures signal attenuation and propagation-time dynamics. Here, we adopt a simple strategy: thickness is predicted as an additional output, allowing the network to learn its relationship with grain size in the latent space.

The final network architecture is shown in Figure 9.

When a time–frequency image is input, it is first normalized using two-dimensional batch normalization. The encoder output is flattened and fed into the fully connected network. Batch normalization is applied again to reduce scale differences across channels and accelerate convergence, and a dropout layer is used to improve generalization. The network outputs three values: the mean log grain size μ, the log grain size standard deviation $[eqn]$ , and the material thickness h.

2.2.5. Model Training

Grain size distribution prediction is a regression task, so mean squared error is used as the loss. Because the sample distribution is imbalanced, we use a weighted MSE to balance sample contributions. The weighted MSE (WMSE) is defined as follows:

[eqn]

where $[eqn]$ is the Batch Size, M is the number of predicted features; $[eqn]$ and $[eqn]$ represent the predicted value and the true value of the j-th feature of the i-th sample, respectively; and $[eqn]$ is the weight of the i-th sample, which will be applied to all features of the sample. In this study task, the weight of the sample with less data was set to 100, and the others were defaulted to 1.

Training used the Adam optimizer with an initial learning rate of 0.001. Cosine Annealing LR was applied with set to 32, the maximum number of epochs was 1000, and early stopping was used. Training stopped if validation loss did not decrease for 200 consecutive epochs, and the model with the minimum validation loss was saved.

3. Results

Before training, the dataset was partitioned into training, validation, and test sets strictly at the metallographic region level, rather than the individual signal level. To evaluate the model’s generalization, five specific regions (#5, #9, #17, #25, and #28) were designated as the test set, while the remaining 25 regions formed the training pool. Importantly, probe frequency was not used as a partitioning factor. signals of all excitation frequencies (5, 10, and 20 MHz) originating from a specific region were kept together within the same data split to ensure the model was tested on entirely ‘unseen’ microstructures. A validation set was constructed by randomly sampling 10% from the training pool and 20% from the test set solely for monitoring training performance. While these samples were visible during validation, they did not participate in weight updates (backpropagation), and no significant hyperparameter tuning was performed based on this set, thereby maintaining the validity of the reported generalization performance.

As shown in Figure 10, the initial loss and WMSE were large. After rapid convergence, the validation loss oscillated briefly and then stabilized after about 800 epochs. At epoch 1225, the validation MSE and loss reached minima of 0.0009 and 0.0026, respectively, and training loss and MSE were 0.0017 and 0.0006. With early stopping, training ended at epoch 1375.

Using this optimal model, the test-set MSE was 0.0010, and the loss was 0.0030, indicating good generalization with limited accuracy degradation on the test set.

To better visualize predicted grain characteristics and thickness effects, the log-normal relationship for log grain size was exponentiated. Mean and variance were computed from Equations (2) and (3), and correlations between predicted and true values are shown in Figure 11a,b. The predicted mean and standard deviation oscillate around $[eqn]$ (dashed line), with MSEs of 2.41 and 3.46, respectively.

Further, based on the elliptical sampling strategy, assuming that the distance between any point in the elliptical region and the central point is x, there is a spatial distribution function, and the spatial $[eqn]$ fusion expectation expression can be obtained as shown in Equation (11). In this formula, $[eqn]$ represents the fusion result, $[eqn]$ represents the i-th prediction result of the detection model, and $[eqn]$ represents the number of discrete ultrasonic signal detection points in the elliptical region.

[eqn]

$[eqn]$ can weight the prediction results according to regional correlation. For example, the closer to the central region, the higher the weighting value. In this study, $[eqn]$ was set as a constant term, reducing the expression to a simple average. Calculation showed improved accuracy (as shown in Figure 11c,d): the MSE of the mean prediction becomes 1.70, and the MSE of the standard deviation becomes 1.96, indicating that this spatial fusion strategy helps to improve the prediction accuracy of the model. This result is reasonable because, compared with the spatial resolution of ultrasonic measurement, the grain measurement area is relatively large, and there is not a strict one-to-one correspondence between a single ultrasonic signal and the average grain size. Only by combining ultrasonic multipoint results or refined metallographic measurements can a better numerical relationship between the ultrasonic signal and the material grain be constructed. Based on the assumption that sampling points are uniformly distributed, we adopted a single-point mapping relationship during training and eliminated noise through simple averaging to obtain the final prediction results.

Next, the thickness prediction is evaluated via prediction bias. As shown in Figure 12, the predicted thickness closely matches the labels, with errors below 1 × 10^−6^. This suggests the model learns thickness-related constraints that aid characterization.

To quantify differences between predicted and target distributions, we use the Kullback–Leibler (KL) divergence [37]. Equation (12) gives the KL divergence, where values closer to zero indicate greater similarity. Equation (13) provides the KL divergence when the target distribution is normal, expressed in terms of the mean and standard deviation of the target and predicted distributions.

[eqn]

[eqn]

Due to the asymmetry of the KL divergence, in the equation, we define $[eqn]$ as the target distribution and $[eqn]$ as the predicted distribution. Therefore, the expectation and standard deviation corresponding to $[eqn]$ are $[eqn]$ and $[eqn]$ , respectively, and the expectation and standard deviation corresponding to $[eqn]$ are $[eqn]$ and $[eqn]$ , respectively.

The KL divergence is calculated based on Equation (13), and the results are shown together with the grain size. As shown in Table 3, the following information can be obtained from the table:

The maximum KL divergence among the 30 samples is 0.0134. Because values below 0.1 indicate high similarity, the results show that all predicted distributions closely match the targets.
The model performs better on the training set than on the test set, which reflects normal transfer error. The overall error is within ±2 μm: the mean MAE is 1.08 μm (MRE 1.63%), and the standard-deviation MAE is 0.84 μm (MRE 6.77%), with an average KL divergence of 0.0031. Samples with larger relative errors in standard deviation tend to exhibit larger KL divergence.
When samples #12 and #15 are used as training samples, their KL divergence values are large, likely due to large relative errors in predicted standard deviation. In particular, sample #12 shows a large absolute error in distribution prediction. This may arise from a large standard deviation in grain size, which increases local heterogeneity and makes characterization by a single grain-size type insufficient. To address these limitations, future work could replace the simple spatial averaging strategy (Equation (11)) with an attention-weighted fusion mechanism. By assigning higher weights to ultrasonic signals that exhibit higher local entropy or distinct scattering patterns, the model could better focus on anomalous grain structures. Additionally, adopting a finer-grained scanning grid (e.g., reducing the 0.5 mm step) or using multi-scale convolutional kernels in the encoder could help extract localized microstructural gradients that are currently smoothed out by global compression.

Further, histograms and kernel density estimation (KDE) curves of the actual distribution of grain size measurements are plotted, and the mean and standard deviation of the true value and predicted value are shown by using the probability density function of the normal distribution. The results are shown in Figure 13. After analysis, it can be seen that for all samples, the overlap effect of the three curves is generally ideal; however, in samples with large KL divergence, some green lines and red lines can be observed to be misaligned. This result strongly demonstrates the effectiveness of KL divergence in characterizing the similarity of distributions.

4. Discussion

To further evaluate applicability, we compared the proposed model with conventional methods, examined generalization under input specificity, and discussed transfer learning as a strategy to improve generalization.

4.1. Comparison with Other Methods

We compared against the attenuation method and the ultrasonic velocity method. Because these physics-based methods estimate only the mean grain size, we compared mean predictions only.

For both methods, signal preprocessing and denoising were identical to those used for the deep learning approach, after which the data were fitted using the respective physical models.

Firstly, based on the attenuation method, according to the relationship between the wavelength $[eqn]$ and the grain diameter $[eqn]$ , there are three common scattering mechanisms [38,39].

[eqn]

In Equation (14), $[eqn]$ , $[eqn]$ and $[eqn]$ are material constants. According to the formula and the experimental data of this study, the dominant scattering is Rayleigh scattering. The characteristic frequency band of the signal is identified by FFT and the attenuation of the main frequency is calculated. Then, the fitting grain attenuation formulas of 4 MHz, 5 MHz, and 7 MHz with clear trends are extracted, and the fitting curve and fitting formula are obtained as shown in Figure 14a. Using the fitted formula to calculate the average grain size of all sample signals, the effect is shown in Figure 14b, and the prediction effect of 5 MHz is the best, with an MSE of 30.52.

We then applied the ultrasonic velocity method. As shown in Figure 15a, the arrival time of each pulse was identified using autoregressive analysis; the propagation time was averaged and thickness was computed. Most studies assume a linear relationship between sound speed and mean grain size [15,16]; we fit the data accordingly to obtain the relationship in Figure 15b. Applying this mapping to all signals yields the correlation in Figure 15c, with an MSE of 24.70, slightly better than the attenuation method.

Model performance was evaluated using MSE; the numerical results are summarized in Table 4.

These results show that the deep learning approach offers clear advantages over physics-based methods in characterization capability, prediction accuracy, and scalability.

4.2. Input Specificity Influence and Transfer Adaptation Method

Signal acquisition for the same physical process can vary under different experimental conditions. Deep learning models are sensitive to such input specificity. With sufficient computation and model capacity, a model may fit data from a fixed condition, but in practice input conditions are difficult to replicate. Therefore, exploring generalization under varying inputs is both academically and economically valuable.

To investigate this issue, we designed a minimal ablation experiment. The model was trained with 20 MHz and 5 MHz probe signals and tested on 10 MHz signals. Based on the original model, a transfer-training set was constructed by combining 80% of the original test set with 5% of the training set. After about 30 training epochs, the loss converged and training stopped. Table 5 compares performance before and after transfer, showing poor generalization without transfer but rapid convergence and improved accuracy after fine-tuning.

For further in-depth analysis, this study visualizes the mean and standard deviation prediction results before and after model transfer, and the results are presented in Figure 16. Before the implementation of transfer learning, the prediction results of the model were highly concentrated around 67 μm, and the numerical maps of the model outputs were very similar for various types of signals. This clearly shows that from the perspective of the initial model, the ultrasonic signals at 10 MHz have high similarity, making it difficult for the model to accurately distinguish subtle differences between different signals, which to some extent reflects that the model has not yet fully grasped the mapping relationship between ultrasonic physical properties and grain size distribution.

However, the model exhibits fast convergence characteristics during the transfer process, and the accuracy is significantly improved after transfer, which can effectively distinguish different signals. This phenomenon confirms that it has effective learning ability for the mapping relationship between the two from the opposite direction. Considering these two seemingly contradictory phenomena, we preliminarily infer that the model has actually learned the mapping relationship effectively. However, due to the lack of data and the unique characteristics of the new signal, the generalization ability of the model in the non-transfer state is limited. The transfer learning method of this experiment can effectively improve the application limitations caused by this problem. In real industrial scenarios, data shortage and equipment differences are common and difficult to avoid. The new data calibration method combined with transfer learning is an effective and promising research idea for deep learning models to be applied in industrial applications.

5. Conclusions

This study proposes an end-to-end deep learning method for grain size distribution prediction using multimodal ultrasonic features with spatial coding. By integrating physical-model parameters with deep learning, accurate prediction on GH4099 is achieved. The main findings are as follows:

High-Precision End-to-End Prediction: The model encodes material and specimen characteristics within the architecture, enabling end-to-end prediction using only ultrasonic signals. Mean grain size and standard deviation are predicted without prior information on material type or thickness. Across test specimens, errors are within ±2 μm; the mean MAE/MRE are 1.08 μm and 1.63%, and the standard-deviation MAE/MRE are 0.84 μm and 6.77%. A KL divergence-based metric assesses distribution prediction; assuming log-normality, the maximum KL divergence is 0.0167 and the average is 0.0031, indicating high fidelity. Compared with physics-based methods, the proposed approach achieves an MSE of 1.695, substantially lower than 30.518 for the best attenuation model and 24.699 for the velocity method.
Multimodal Features Fusion and Network Transferability: The method integrates multiple ultrasonic features (attenuation, center frequency, and acoustic velocity) and applies a spatial fusion strategy aligned with the relationship between grain measurement and ultrasonic resolution. This preserves characterization-relevant information. The encoder–decoder architecture decouples feature extraction from task-specific decoding; the encoder learns robust, multi-frequency features, while the decoder can be adapted through network structure, parameters, and training data to different scenarios, enabling efficient cross-domain transfer.
Transfer Learning-Based Model Generalization Analysis: Probe-variation experiments show that, without transfer learning, the original model cannot reliably distinguish grain size distributions for different inputs. With brief transfer training, the model rapidly converges and achieves improved prediction, demonstrating practical applicability through transfer learning calibration and adaptability to scenario-specific conditions.

In summary, the proposed end-to-end approach supports online inspection by combining multimodal feature fusion and a flexible network architecture. The transfer learning strategy improves adaptability to diverse scenarios, providing a practical solution for fast, flexible, and low-cost industrial inspection and highlighting the application potential of deep learning in industrial non-destructive evaluation

Bibliography39

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Yuan X. Chen L. Zhao Y. Di H. Zhu F. Dependence of Grain Size on Mechanical Properties and Microstructures of High Manganese Austenitic Steel Procedia Eng.20148114314810.1016/j.proeng.2014.09.141 · doi ↗
2Savaedi Z. Mirzadeh H. Aghdam R.M. Mahmudi R. Effect of grain size on the mechanical properties and bio-corrosion resistance of pure magnesium J. Mater. Res. Technol.2022193100310910.1016/j.jmrt.2022.06.048 · doi ↗
3Armstrong R.W. The influence of polycrystal grain size on several mechanical properties of materials Metall. Trans.197011169117610.1007/BF 02900227 · doi ↗
4Choi S. Ryu J. Kim J.-S. Jhang K.-Y. Comparison of Linear and Nonlinear Ultrasonic Parameters in Characterizing Grain Size and Mechanical Properties of 304L Stainless Steel Metals 20199127910.3390/met 9121279 · doi ↗
5Li X. Cui L. Li J. Chen Y. Han W. Shonkwiler S. Mc Mains S. Automation of intercept method for grain size measurement: A topological skeleton approach Mater. Des.202222411135810.1016/j.matdes.2022.111358 · doi ↗
6Mingard K.P. Roebuck B. Bennett E.G. Gee M.G. Nordenstrom H. Sweetman G. Chan P. Comparison of EBSD and conventional methods of grain size measurement of hardmetals Int. J. Refract. Met. Hard Mater.20092721322310.1016/j.ijrmhm.2008.06.009 · doi ↗
7Hongbo N. Qisen Z. Jianping Z. Xiao W. Yang Y. The preparation, preparation mechanism and properties of extra coarse-grained WC–Co hardmetals Met. Powder Rep.20177218819410.1016/j.mprp.2017.01.001 · doi ↗
8Ryde L. Application of EBSD to analysis of microstructures in commercial steels Mater. Sci. Technol.2006221297130610.1179/174328406 X 130948 · doi ↗