Practical Physical Layer Authentication for Mobile Scenarios Using a Synthetic Dataset Enhanced Deep Learning Approach

Yijia Guo; Junqing Zhang; Y.-W. Peter Hong

arXiv:2508.20861·cs.LG·March 24, 2026

Practical Physical Layer Authentication for Mobile Scenarios Using a Synthetic Dataset Enhanced Deep Learning Approach

Yijia Guo, Junqing Zhang, Y.-W. Peter Hong

PDF

Open Access

TL;DR

This paper presents a deep learning-based physical layer authentication method for mobile IoT scenarios that uses synthetic datasets to effectively identify devices based on wireless channel characteristics, demonstrating high accuracy in real-world tests.

Contribution

The paper introduces a novel CNN-based Siamese network approach utilizing synthetic CSI datasets for practical physical layer authentication in dynamic mobile environments.

Findings

01

Synthetic dataset generation reduces data collection overhead.

02

The proposed CNN Siamese model outperforms traditional methods.

03

Experimental results show high generalization and authentication accuracy.

Abstract

The Internet of Things (IoT) is ubiquitous thanks to the rapid development of wireless technologies. However, the broadcast nature of wireless transmissions results in great vulnerability to device authentication. Physical layer authentication emerges as a promising approach by exploiting the unique channel characteristics. However, a practical scheme applicable to dynamic channel variations is still missing. In this paper, we proposed a deep learning-based physical layer channel state information (CSI) authentication for mobile scenarios and carried out comprehensive simulation and experimental evaluation using IEEE 802.11n. Specifically, a synthetic training dataset was generated based on the WLAN TGn channel model and the autocorrelation and the distance correlation of the channel, which can significantly reduce the overhead of manually collecting experimental datasets. A…

Tables2

Table 1. TABLE I: WLAN TGn Channel Models [ 31 ] .

Model	RMS Delay (ns)	No. Clusters	Mapped Environment
Model B	15	2	Residential apartment
Model C	30	2	Small office
Model D	50	3	Typical office
Model E	100	4	Large office
Model F	150	6	Large space

Table 2. TABLE II: Comparison of Computational Requirements.

	Parameter Count	FLOPs
FCN-based Siamese network in [28]	282,641	1,435,645
CNN-based Siamese network	29,972	1,175,697

Equations41

x^{[k]} [n] = m = 0 \sum M - 1 X^{[k]} [m] e^{j 2 π mn / M^{'}}, n = 0, 1, \dots, N - 1.

x^{[k]} [n] = m = 0 \sum M - 1 X^{[k]} [m] e^{j 2 π mn / M^{'}}, n = 0, 1, \dots, N - 1.

y^{[k]} [n] = l = 0 \sum L - 1 x^{[k]} [n - l] h_{ba}^{[k]} [l] + z [n],

y^{[k]} [n] = l = 0 \sum L - 1 x^{[k]} [n - l] h_{ba}^{[k]} [l] + z [n],

Y^{[k]} [m]

Y^{[k]} [m]

= X^{[k]} [m] H_{ba}^{[k]} [m] + Z [m],

H_{ba}^{[k]} [m] = l = 0 \sum L - 1 h_{ba}^{[k]} [l] e^{- j 2 π m l / M^{'}} .

H_{ba}^{[k]} [m] = l = 0 \sum L - 1 h_{ba}^{[k]} [l] e^{- j 2 π m l / M^{'}} .

H_{ba}^{[k]} [m] = \frac{Y ^{[k]} [ m ]}{X ^{[k]} [ m ]} = H_{ba}^{[k]} [m] + \frac{Z [ m ]}{X ^{[k]} [ m ]} .

H_{ba}^{[k]} [m] = \frac{Y ^{[k]} [ m ]}{X ^{[k]} [ m ]} = H_{ba}^{[k]} [m] + \frac{Z [ m ]}{X ^{[k]} [ m ]} .

r (X_{1}, X_{2}) = \frac{( X _{1} - X _{1} ˉ ) ^{T} ( X _{2} - X _{2} ˉ )}{∥ X _{1} - X _{1} ˉ ∥∥ X _{2} - X _{2} ˉ ∥},

r (X_{1}, X_{2}) = \frac{( X _{1} - X _{1} ˉ ) ^{T} ( X _{2} - X _{2} ˉ )}{∥ X _{1} - X _{1} ˉ ∥∥ X _{2} - X _{2} ˉ ∥},

D=\left\{\begin{array}[]{rl}1,&\text{when }r\leq\epsilon_{\rm c};\\ 0,&\text{when }r>\epsilon_{\rm c},\end{array}\right.

D=\left\{\begin{array}[]{rl}1,&\text{when }r\leq\epsilon_{\rm c};\\ 0,&\text{when }r>\epsilon_{\rm c},\end{array}\right.

{\bm{U}}_{i}\triangleq\left\{\begin{array}[]{rl}(\widehat{\bm{H}}_{\rm ba}^{[k]},\widehat{\bm{H}}_{\rm ba}^{[k+1]}),&\text{when }V_{i}=0;\\ (\widehat{\bm{H}}_{\rm ba}^{[k]},\widehat{\bm{H}}_{\rm ma}^{[k+1]}),&\text{when }V_{i}=1.\end{array}\right.

{\bm{U}}_{i}\triangleq\left\{\begin{array}[]{rl}(\widehat{\bm{H}}_{\rm ba}^{[k]},\widehat{\bm{H}}_{\rm ba}^{[k+1]}),&\text{when }V_{i}=0;\\ (\widehat{\bm{H}}_{\rm ba}^{[k]},\widehat{\bm{H}}_{\rm ma}^{[k+1]}),&\text{when }V_{i}=1.\end{array}\right.

D=\left\{\begin{array}[]{rl}1,&\text{when }s>\epsilon_{\rm s};\\ 0,&\text{when }s\leq\epsilon_{\rm s},\end{array}\right.

D=\left\{\begin{array}[]{rl}1,&\text{when }s>\epsilon_{\rm s};\\ 0,&\text{when }s\leq\epsilon_{\rm s},\end{array}\right.

S (f) = \frac{A}{π f _{d}} \cdot \frac{1}{1 + A ( \frac{f}{f _{d}} ) ^{2}},

S (f) = \frac{A}{π f _{d}} \cdot \frac{1}{1 + A ( \frac{f}{f _{d}} ) ^{2}},

f_{d} = \frac{v _{0}}{λ},

f_{d} = \frac{v _{0}}{λ},

R (Δ t) = e^{- \frac{2 π f _{d}}{A} Δ t},

R (Δ t) = e^{- \frac{2 π f _{d}}{A} Δ t},

h_{ba}^{[k + 1]} [l] = R (Δ t_{k}) h_{ba}^{[k]} [l] + 1 - R^{2} (Δ t_{k}) ω_{1} [l],

h_{ba}^{[k + 1]} [l] = R (Δ t_{k}) h_{ba}^{[k]} [l] + 1 - R^{2} (Δ t_{k}) ω_{1} [l],

H_{ba}^{[k + 1]} [m] = l = 0 \sum L - 1 h_{ba}^{[k + 1]} [l] e^{- j 2 π m l / M}

H_{ba}^{[k + 1]} [m] = l = 0 \sum L - 1 h_{ba}^{[k + 1]} [l] e^{- j 2 π m l / M}

= R (Δ t_{k}) H_{ba}^{[k]} [m] + 1 - R^{2} (Δ t_{k}) l = 0 \sum L - 1 ω_{1} [l] e^{- j 2 π m l / M}

= R (Δ t_{k}) H_{ba}^{[k]} [m] + 1 - R^{2} (Δ t_{k}) Ω_{1} [m] .

Δ t_{bm} = \frac{d _{bm}}{v _{0}} = \frac{d _{bm}}{f _{d} λ} .

Δ t_{bm} = \frac{d _{bm}}{v _{0}} = \frac{d _{bm}}{f _{d} λ} .

h_{ma}^{[k + 1]} [l] = \frac{1}{Θ} (ρ (d_{bm}) h_{ba}^{[k + 1]} [l] + 1 - ρ^{2} (d_{bm}) ω_{2} [l]),

h_{ma}^{[k + 1]} [l] = \frac{1}{Θ} (ρ (d_{bm}) h_{ba}^{[k + 1]} [l] + 1 - ρ^{2} (d_{bm}) ω_{2} [l]),

H_{ma}^{[k + 1]} [m] = l = 0 \sum L - 1 h_{ma}^{[k + 1]} [l] e^{- j 2 π m l / M}

H_{ma}^{[k + 1]} [m] = l = 0 \sum L - 1 h_{ma}^{[k + 1]} [l] e^{- j 2 π m l / M}

= \frac{ρ ( d _{bm} )}{Θ} H_{ba}^{[k + 1]} [m] + \frac{1 - ρ ^{2} ( d _{bm} )}{Θ} l = 0 \sum L - 1 ω_{2} [l] e^{- j 2 π m l / M}

= \frac{ρ ( d _{bm} )}{Θ} H_{ba}^{[k + 1]} [m] + \frac{1 - ρ ^{2} ( d _{bm} )}{Θ} Ω_{2} [m] .

L = V_{i} {max (0, η - s)}^{2} + (1 - V_{i}) s^{2},

L = V_{i} {max (0, η - s)}^{2} + (1 - V_{i}) s^{2},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimedia Communication and Technology

Full text

Practical Physical Layer Authentication for Mobile Scenarios Using a Synthetic Dataset Enhanced Deep Learning Approach

Yijia Guo, Junqing Zhang, ,

and Y.-W. Peter Hong Manuscript received xxx; revised xxx; accepted xxx. Date of publication xxx; date of current version xxx. The work of J. Zhang was supported in part by the UK EPSRC under grant ID EP/V027697/1 and EP/Y037197/1, and in part by Royal Society Research Grants under grant ID RGS/R1/231435. The work of Y.-W. P. Hong was supported in part by the National Science and Technology Council (NSTC) of Taiwan under grant NSTC 111-2221-E-007-042-MY3. The review of this paper was coordinated by xxx. *(Corresponding author: Junqing Zhang.)*Y. Guo is with the Department of Electrical Engineering and Electronics, University of Liverpool, Liverpool, L69 3GJ, United Kingdom. She is also with the Institute of Communications Engineering, National Tsing Hua University, Hsinchu, Taiwan 300044. (email: [email protected])J. Zhang is with the Department of Electrical Engineering and Electronics, University of Liverpool, Liverpool, L69 3GJ, United Kingdom. (email: [email protected])Y.-W. P. Hong is with the Institute of Communications Engineering, National Tsing Hua University, Hsinchu, Taiwan 300044. (email:[email protected])Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.Digital Object Identifier xxx

Abstract

The Internet of Things (IoT) is ubiquitous thanks to the rapid development of wireless technologies. However, the broadcast nature of wireless transmissions results in great vulnerability to device authentication. Physical layer authentication emerges as a promising approach by exploiting the unique channel characteristics. However, a practical scheme applicable to dynamic channel variations is still missing. In this paper, we proposed a deep learning-based physical layer channel state information (CSI) authentication for mobile scenarios and carried out comprehensive simulation and experimental evaluation using IEEE 802.11n. Specifically, a synthetic training dataset was generated based on the WLAN TGn channel model and the autocorrelation and the distance correlation of the channel, which can significantly reduce the overhead of manually collecting experimental datasets. A convolutional neural network (CNN)-based Siamese network was exploited to learn the temporal and spatial correlation between the CSI pair and output a score to measure their similarity. We adopted a synergistic methodology involving both simulation and experimental evaluation. The experimental testbed consisted of WiFi IoT development kits and a few typical scenarios were specifically considered. Both simulation and experimental evaluation demonstrated excellent generalization performance of our proposed deep learning-based approach and excellent authentication performance. Demonstrated by our practical measurement results, our proposed scheme improved the area under the curve (AUC) by 0.03 compared to the fully connected network-based (FCN-based) Siamese model and by 0.06 compared to the correlation-based benchmark algorithm.

Index Terms:

Internet of Things, physical layer authentication, channel state information, synthetic dataset, Siamese network.

I Introduction

The rapid development of wireless technologies has enabled ubiquitous Internet of Things (IoT) connectivity and triggered numerous transformative applications to our everyday life, e.g., smart home, smart cities, connected healthcare, industrial IoT [1]. Such a revolution is enabled by massively connected devices through wireless communication technology, whose number is predicted by International Data Corporation (IDC) to reach 55.7 billion by 2025111https://blogs.idc.com/2021/01/06/future-of-industry-ecosystems-shared-data-and-insights/. Wireless communications are preferred to connect them, such as WiFi, Bluetooth, ZigBee, and LoRa. However, due to the broadcast nature of the wireless medium, any device within the communication range can get access to the signal. This results in spoof attacks being carried out easily, where an attacker pretends to be a legitimate user. Device authentication serves as the countermeasure, which identifies device identity to allow network access for legitimate devices and deny malicious users. Existing authentication schemes rely on media access control (MAC) address as the identifier, which can however be easily spoofed.

There has been growing interest in physical layer authentication which identifies a device using its channel characteristics [2, 3, 4]. It can be categorized into received signal strength (RSS)-based and channel state information (CSI)-based schemes. RSS is utilized to detect the existence of rogue devices [5], determine the number of attackers [6], and locate the adversaries [7]. However, RSS is a coarse-grained estimate of the channel, hence the detection accuracy is limited. In contrast, CSI is finer-grained, which can provide more detailed information of the channel. Therefore, CSI, including channel frequency response (CFR) [8, 9, 10, 11] and channel impulse response (CIR) [12, 13, 14, 15, 16, 17, 18], has been widely used to enhance the authentication performance.

Most of the existing work focuses on stationary scenarios, where all the devices remain at their fixed places and no other channel variation is caused by the environment. In this case, a receiver will always obtain constant channel characteristics from devices. A CFR-based scheme is designed in [8], which is extended by multiple-input and multiple-output (MIMO) to get a security gain in [9]. A CIR-based scheme is proposed for single-carrier wireless networks in [12]. Since the channel can be used as a unique fingerprint of devices, many learning-based methods have been proposed to classify the channel patterns and identify device identity. $K$ -means algorithm is applied in [19], where the cluster number of CSI is utilized to determine whether there is an attacker. Support vector machine (SVM) is used to obtain the similarity between the unknown CSI and the local user profile for device authentication [20]. Deep learning techniques, such as convolutional neural network (CNN), are also used, e.g., in [21].

However, the dynamic channel is not considered in the above work, which significantly limits their applications in practical IoT scenarios. Many devices, e.g. smartphones, will be mobile. In other cases, IoT devices, e.g., smart meters, may remain fixed, but the surrounding wireless environment will vary due to passing pedestrians and/or vehicles. When the channel is dynamic due to device moving and/or environment variations, the channel characteristics measured at each device are time-varying. Unfortunately, the varied channel characteristics cannot be considered as a unique pattern anymore, which invalidates the approaches for stationary scenarios.

Early research on physical layer authentication in mobile scenarios was primarily based on hypothesis testing. A CFR-based scheme is proposed in [10] considering the channel correlations among the time, frequency and spatial domains. Moreover, its application in frequency-selective Rayleigh channels is studied in [11]. The CIR-based scheme is proposed for wireless communications in a time-varying multipath channel in [13]. Then the CIR-based scheme is integrated with multipath delay for reliable authentication performance at low signal-to-noise ratio (SNR) conditions [14]. Followed by [15], the CIR-based scheme is further combined with a two-dimensional quantization method to simplify the decision rule for authentication. In recent years, several multi-dimensional authentication mechanisms have been proposed. A multiple CIRs physical layer authentication scheme is proposed to reduce the performance loss due to quantization error [16]. Location-specific channel gain and transmitter-specific phase noise are exploited for physical layer authentication in massive MIMO systems [18]. The above works mainly provide hypothesis testing-based modeling and analysis. In addition to the hypothesis testing-based studies, the temporal correlation in the channel is leveraged in [22], where Pearson correlation is computed and authentication is performed based on an empirically determined threshold.

With the development of data-driven technologies, many learning-based methods have been proposed, which can be categorized into three approaches. The first approach involves authentication based on channel prediction. In [23], Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) are investigated for MIMO systems to predict future channel states using previous channel measurements, followed by threshold-based mean square error (MSE) detection. Similarly, in [24], the legitimate CSI is predicted based on historical CSI and the transmitter’s geographical information for authentication. The work assumes the trajectory of the device is either coordinated or even controlled, which is unrealistic in practical scenarios. Moreover, these studies lack experimental validation with real-world systems. The second approach focuses on authentication using classification algorithms, where legitimate and rogue devices move in distinct regions, ensuring non-overlapping CSI measurements to establish a clear classification boundary. Decision Trees (DT), Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and ensemble learning are explored in [25] to classify CSI measurements in mobile scenarios. Furthermore, a weighted voting scheme based on the SVM classifier from [25] is introduced in [26] to improve classification accuracy. A ResNet model is implemented to extract features from mobile CSI measurements in [27]. However, in real-world scenarios, if the paths of legitimate and rogue devices overlap and their channels are similar, classification-based methods are no longer applicable. Another classification-based algorithm was proposed in [17], where fuzzy learning is used to indicate the likelihood that CSI and other physical layer information belong to each class. The third approach is similarity-based authentication. In [28], a method is proposed that combines a sliding window with a Siamese network, with a fully connected network (FCN) as the embedding network.

In this paper, we proposed a practical and robust deep learning-based physical layer authentication scheme and carried out comprehensive simulation and experimental evaluation. In particular, we carefully tuned a synthetic training dataset to cover the test scenarios, which can significantly eliminate the overhead of collecting experimental datasets. In addition, we adopted a CNN-based Siamese deep learning model to learn the similarity between a pair of CSI estimations. The simulation and experimental evaluation demonstrated the robustness and generalization capability of the proposed scheme.

To the best of our knowledge, the authentication method proposed in [22] and [28] are the most practical ones, hence, they are used as the benchmark algorithms in this paper. Our main contributions are summarized as follows.

•

A synthetic training dataset was generated for indoor mobile scenarios. Specifically, the synthetic channels were created based on the WLAN TGn channel model, along with the autocorrelation and distance correlation of the channel. To ensure the generalization of the training dataset, a comprehensive set of simulation parameters was carefully optimized through simulation evaluation. In the experimental scenarios, the designed synthetic dataset was further used as the training dataset. Validation with experimental test dataset demonstrated that, in various typical indoor scenarios, the synthetic training dataset achieved performance comparable to the experimental training dataset and had the potential to outperform it. This suggested that the offline-generated synthetic training dataset could replace the experimental training dataset for model training, eliminating the need for extensive manual CSI collection.

•

A Siamese model utilizing a CNN as the embedding network was employed to capture the temporal and spatial similarity between CSI measurements, and output a score to measure the similarity. The device authentication was achieved by comparing the score with a threshold obtained empirically. It was demonstrated that our scheme could achieve better authentication performance with lower computational overhead compared to the FCN-based Siamese network in [28].

•

We carried out a comprehensive simulation evaluation of the proposed scheme on different WLAN channel models in MATLAB. We studied the effect of SNR, the distance between legitimate and rogue devices, and the transmission interval on authentication performance in simulation. The simulation evaluation allowed us to optimize the parameters of the synthetic training dataset quickly and efficiently.

•

We also performed an extensive experimental evaluation using WiFi in various indoor environments. A testbed using the ESP32 kit and two LoPy4 boards was created. Typical indoor environments involving both line-of-sight (LOS) and non-line-of-sight (NLOS) as well as various SNR were considered. We explored the reliability of the synthetic dataset and the effectiveness of the CNN-based Siamese network. The generalization performance of the proposed scheme in different typical test scenarios was also evaluated. Demonstrated by our experimental results, the proposed scheme improved the area under the curve (AUC) by 0.06 compared to the correlation-based benchmark algorithm in [22] and by 0.03 compared to the FCN-based Siamese network in [28].

The datasets are available online222https://ieee-dataport.org/documents/wi-fi-channel-state-information-dataset-mobile-physical-layer-authentication. In our previous work [30], we proposed a CNN model for device authentication in mobile scenarios. However, in this work, we significantly extend by designing a synthetic training dataset and a Siamese-based model. Moreover, the performance evaluation is conducted and evaluated in both simulation and a real experimental WiFi testbed.

The rest of this paper is organized as follows. Section II introduces the system model and problem statement. Section III briefly describes the proposed method for device authentication. Section IV elaborates on the generation of the synthetic dataset. Section V describes the CNN-based Siamese network. Section VI and Section VII present and discuss the simulation results and the experiment results, respectively. Section VIII provides the scalability analysis and advantages of synthetic dataset and Section IX concludes the paper.

II System Model and Problem Statement

II-A System Model

As shown in Fig. 1, two legitimate users, Alice and Bob, aim to communicate securely in a time-variant channel. There is also an attacker, Mallory, who intends to carry out spoof attacks by injecting packets into the open wireless channel. All the users are operating in IEEE 802.11n legacy OFDM mode with a $20$ MHz channel bandwidth and $M^{\prime}=64$ subcarriers. Long training symbols are used for channel estimation which occupy $M=52$ subcarriers. Assuming that Alice receives the $k$ -th packet from Bob and $X^{[k]}[m]$ represents the long training symbol modulated to the $m$ -th subcarrier in the $k$ -th packet, the transmitted time domain signal can be written as

[TABLE]

The dynamic channel between Bob and Alice at time $t$ has a CIR ${\bm{h}}_{\rm ba}(t)\triangleq[h_{\rm ba}(0,t),h_{\rm ba}(1,t),\dots,h_{\rm ba}(L-1,t)]$ , where $L$ is the number of channel taps. Suppose the $k$ -th packet is transmitted at time $t_{k}$ , we denote $h_{\rm ba}^{[k]}[l]=h_{\rm ba}(l,t_{k})$ as the corresponding channel. The $n$ -th received signal sample of the long training symbol in the $k$ -th packet can be written as

[TABLE]

where $z[n]$ is the additive Gaussian white noise (AWGN) and $z[n]\sim\mathcal{CN}(0,\sigma_{z}^{2})$ . The equivalent frequency domain signal can be written as

[TABLE]

where $H_{\rm ba}^{[k]}[m]$ is the channel coefficient on the $m$ -th subcarrier and given as

[TABLE]

The estimated channel coefficient over $M$ subcarriers can be written as $\widehat{\bm{H}}_{\rm ba}^{[k]}=[\widehat{H}_{\rm ba}^{[k]}[0],\widehat{H}_{\rm ba}^{[k]}[1],\dots,\widehat{H}_{\rm ba}^{[k]}[M-1]]$ , which we refer to as the CSI and can be obtained based on the widely used least square (LS) channel estimation, with

[TABLE]

The magnitude of the CSI estimation $\widehat{\bm{H}}_{\rm ba}^{[k]}$ can be defined as $|\widehat{\bm{H}}_{\rm ba}^{[k]}|=[|\widehat{H}_{\rm ba}^{[k]}[0]|,|\widehat{H}_{\rm ba}^{[k]}[1]|,\dots,|\widehat{H}_{\rm ba}^{[k]}[M-1]|]$ , where $|\widehat{H}_{\rm ba}^{[k]}[m]|$ is the magnitude of the estimated channel coefficient on the $m$ -th subcarrier.

Threat Model: As indicated in [4], passive eavesdroppers are very rare in physical layer authentication. In this paper, we consider an active attacker, Mallory, who is located distance $d_{\rm bm}$ away from Bob and impersonates Bob by transmitting packets to Alice in a burst mode. The CIR between Mallory and Alice at time $t$ is given as ${\bm{h}}_{\rm ma}(t)\triangleq[h_{\rm ma}(0,t),h_{\rm ma}(1,t),\dots,h_{\rm ma}(L-1,t)]$ . We assume that ${\bm{h}}_{\rm ma}(t)$ and ${\bm{h}}_{\rm ba}(t)$ have the same number of channel taps, which is the worst-case since Mallory is least likely to be authenticated. Mallory knows the wireless protocol that Alice and Bob are using. It also has access to its configuration such as bandwidth, carrier frequency, etc. Mallory aims to spoof Bob by transmitting signals to Alice. We assume that Mallory only performs burst signal attacks, i.e., Mallory does not transmit consecutive packets to Alice.

When Alice receives the $(k+1)$ -th packet, she will estimate the channel coefficients from its received packet, which can be either from Bob or Mallory. In other words, $\widehat{\bm{H}}^{[k+1]}$ can be $\widehat{\bm{H}}_{\rm ba}^{[k+1]}$ or $\widehat{\bm{H}}_{\rm ma}^{[k+1]}$ . If the packet is from Bob (i.e., $\widehat{\bm{H}}^{[k+1]}=\widehat{\bm{H}}_{\rm ba}^{[k+1]}$ ), the similarity between $\widehat{\bm{H}}^{[k+1]}$ and $\widehat{\bm{H}}_{\rm ba}^{[k]}$ will be high, because Bob will not move too far away for practical transmission intervals and typical terminal speeds. Fig. 2 exemplifies three CSI measurements collected by IEEE 802.11n. The collection sites of CSI $2$ and CSI $3$ are spaced $0.25$ cm and the two CSI estimations are highly similar. In contrast, if the packet is transmitted by the attacker Mallory (i.e., $\widehat{\bm{H}}^{[k+1]}=\widehat{\bm{H}}_{\rm ma}^{[k+1]}$ ), $\widehat{\bm{H}}^{[k+1]}$ will be dissimilar from $\widehat{\bm{H}}_{\rm ba}^{[k]}$ , as exemplified in Fig. 2 where the distance between the collection site of CSI 1 and the collection sites of CSI $2$ is about $0.5$ m. Therefore, by comparing the similarity of CSI measurements between the adjacent two packets, i.e., $\widehat{\bm{H}}_{\rm ba}^{[k]}$ and $\widehat{\bm{H}}^{[k+1]}$ , Alice will be able to authenticate the identity of the transmitter.

II-B Problem Statement

Correlation is a straightforward approach to quantify the CSI similarity. The fluctuation of CSI is related to the wireless environment and the mobile speed of the device, which means that the time-variant CSI is temporally and spatially correlated. Indeed, the work in [22] designs a correlation-based method for device authentication in mobile scenarios, which is used as the benchmark algorithm in this paper.

The Pearson correlation coefficient of the two CSI measurements, $\widehat{\bm{H}}_{\rm ba}^{[k]}$ and $\widehat{\bm{H}}^{[k+1]}$ , can be defined as

[TABLE]

where ${\bm{X}_{1}}=|\widehat{\bm{H}}_{\rm ba}^{[k]}|$ , ${\bm{X}_{2}}=|\widehat{\bm{H}}^{[k+1]}|$ , and $\bar{{\bm{X}}}=1/M\cdot({\bm{1}}_{1\times M}\cdot{\bm{X}})\cdot{\bm{1}}_{M\times 1}$ . After obtaining the correlation between adjacent CSI measurements, a detection mechanism is given as

[TABLE]

where $D=1$ denotes that there is a rogue device, $D=0$ denotes that there is no rogue devices and a threshold $\epsilon_{\rm c}$ is obtained empirically through experiments.

However, correlation-based authentication relies directly on CSI measurements. When the SNR is low, the quality of CSI measurements degrades, making it less robust to noise. Besides, when Mallory is close to Bob, due to the spatial correlation, ${\bm{h}}_{\rm ma}(t_{k+1})$ and ${\bm{h}}_{\rm ba}(t_{k})$ will be highly similar. The relationship between the distance and the detection capability is not studied yet but is very important.

III Proposed Method

We propose a deep learning-based method to learn the similarity between a pair of CSI estimations. By incorporating a wide variety of channel models and SNRs into the training dataset, the prior information about the channel environment can be learned, which helps to improve the robustness of the deep learning model. Therefore, we design a deep learning Siamese network-based approach, enhanced by a synthetic training dataset, as shown in Fig. 3. The approach consists of the training and test stages.

III-A Training Stage

For deep learning-based methods, a training dataset $\mathcal{D}_{\text{train}}=\{({\bm{U}}_{i},V_{i})\}_{i=1}^{|\mathcal{D}|}$ is essential, where $|\mathcal{D}|$ is the cardinality of set $\mathcal{D}_{\rm train}$ and

[TABLE]

Assuming the $k$ -th packet is transmitted by Bob, we set $V_{i}=0$ when the $(k+1)$ -th received packet is from Bob, treating the CSI measurements are highly correlated and $V_{i}=1$ when the $(k+1)$ -th received packet is from Mallory, treating them uncorrelated. There are currently two ways to obtain a training dataset: experimental collection of CSI measurements or artificial synthesis of CSI estimations. However, collecting a comprehensive dataset from experiments is usually time-consuming and labor-intensive. Therefore, this paper proposes to generate abundant and accurate CSI estimations by simulation as a synthetic training dataset. The details of the synthetic data generation will be introduced in Section IV.

Furthermore, the similarity between CSI measurement pairs is the key to device authentication, and the Siamese network is well-suited for learning the similarity between two inputs. Therefore, a Siamese network is exploited. It uses two identical CNN-based embedding networks to work on two different input vectors and computes comparable output values. The goal is to make the Siamese-based model learn a similarity function that measures how similar the two input vectors are and returns a similarity value. In this work, a low score is returned when the input vectors are similar and a high score is returned when the input vectors are different. A contrastive loss function is then calculated. The details of the Siamese-based model will be introduced in Section V.

III-B Test Stage

In the test stage, the trained model will be used to detect rogue users. When Alice gets two CSI estimations $\widehat{\bm{H}}_{\rm ba}^{[k]}$ and $\widehat{\bm{H}}^{[k+1]}$ , she will pass the magnitude of them to the trained Siamese network, which will return a score $s$ . The score $s$ is compared with a threshold to make the decision:

[TABLE]

where $D=1$ denotes that there is a rogue device, and $D=0$ denotes that there is no rogue device. The threshold $\epsilon_{\rm s}$ is obtained empirically through experiments.

In the simulation and experimental evaluation, which will be elaborated in Section VI and Section VII, respectively, the true positive rate (TPR), false positive rate (FPR), receiver operating characteristics (ROC) and area under the curve (AUC) are used as the performance metrics. Specifically, TPR is defined as the proportion of correctly classified positive samples to the total positive samples, and FPR is defined as the proportion of misclassified negative samples to the total negative samples. Based on TPR and FPR, the ROC curve and the AUC can be obtained for performance evaluation. The ROC curve plots TPR versus FPR and is a common performance metric for classification problems under various threshold settings. AUC represents the area under the ROC curve, which tells how well the model is able to discriminate between categories. The higher the AUC, the better the performance for authentication.

IV Synthetic Dataset Generation

In this section, the IEEE 802.11n operating in 2.4 GHz carrier frequency and the Rayleigh fading channel are considered. We first describe the discrete channel model of WLAN TGn channel, then introduce the synthetic channel generation process, and elaborate on the generation of the synthetic training dataset.

IV-A WLAN TGn Channel Model

A set of WLAN channel models is proposed for different environments, as shown in Table I. The power delay profile (PDP) of each model is defined based on the cluster modeling approach, where multiple clusters are assigned to models and each cluster is outlined by exponential decay. The specific PDP value of each model can be found in [31].

In indoor wireless systems, the Doppler spectrum is defined as the Bell shape spectrum, given in the linear scale as [31]

[TABLE]

where $A$ is a constant 9 and the Doppler spread $f_{\rm d}$ is

[TABLE]

where $v_{0}$ is the terminal moving speed and $\lambda$ is the wavelength. Therefore, the autocorrelation function can be calculated as

[TABLE]

where $\Delta t$ represents the time interval.

IV-B Synthetic Channel Generation

Assuming that $h_{\rm ba}[l]\sim\mathcal{CN}(0,\sigma_{\rm ba}^{2}(l)),l=0,1,\cdots,L-1$ , and the channel samples of all taps follow the same Doppler spectrum, the channel $h_{\rm ba}^{[k+1]}[l]$ can be written as [32]

[TABLE]

where $\Delta t_{k}=t_{k+1}-t_{k}$ denotes the transmission interval, $R(\Delta t_{k})$ denotes the autocorrelation coefficient of channel $h_{\rm ba}[l]$ and the random component $\omega_{1}[l]\sim\mathcal{CN}(0,\sigma_{\rm ba}^{2}(l))$ . The corresponding CFR can be expressed as

[TABLE]

Given the distance between Bob and Mallory $d_{\rm bm}$ , it can be derived from (11) that the equivalent time interval between Bob and Mallory is

[TABLE]

The channel $h_{\rm ma}^{[k+1]}[l]$ can be written as [33]

[TABLE]

where $\rho(d_{\rm bm})=R(\Delta t_{\rm bm})$ denotes the correlation coefficient between $h_{\rm ba}^{[k+1]}[l]$ and $h_{\rm ma}^{[k+1]}[l]$ spaced by distance $d_{\rm bm}$ and the random component $\omega_{2}[l]\sim\mathcal{CN}(0,\sigma_{\rm ba}^{2}(l))$ and $\Theta=\sigma_{\rm ba}^{2}(l)/\sigma_{\rm ma}^{2}(l)$ , where we assume that $\Theta$ is a constant for different channel tap. The corresponding CFR can be expressed as

[TABLE]

IV-C Synthetic Training Dataset Generation Using MATLAB

The synthetic dataset $\mathcal{D}_{\rm train}^{\rm S}=\{({\bm{U}}_{i}^{\rm S},V_{i}^{\rm S})\}_{i=1}^{|\mathcal{D}_{\rm S}|}$ contains synthetic CSI pairs, where $|\mathcal{D}_{\rm S}|$ is the cardinality of set $\mathcal{D}_{\rm train}^{\rm S}$ . Specifically, the channel ${\bm{h}}_{\rm ba}(t_{k})$ is generated based on IEEE 802.11 TGn channel models by using MATLAB WLAN toolbox333https://www.mathworks.com/help/wlan/ref/wlantgnchannel-system-object.html. The channel ${\bm{h}}_{\rm ba}(t_{k+1})$ and ${\bm{h}}_{\rm ma}(t_{k+1})$ are generated based on (13) and (16), respectively. A WLAN Non-HT format waveform is generated and is filtered by channel ${\bm{h}}_{\rm ba}(t_{k})$ , ${\bm{h}}_{\rm ba}(t_{k+1})$ and ${\bm{h}}_{\rm ma}(t_{k+1})$ , respectively. Then AWGN is added. At the receiver, the LTF is extracted and CSI estimations, i.e., $\widehat{\bm{H}}_{\rm ba}^{[k]}$ , $\widehat{\bm{H}}_{\rm ba}^{[k+1]}$ , and $\widehat{\bm{H}}_{\rm ma}^{[k+1]}$ , are obtained. We can then construct the synthetic dataset based on (8).

In order to enable the synthetic training dataset to align with the practical scenarios and be generalizable, we carefully tuned the following parameters.

•

Different multipath environments: WLAN TGn channel models B-F. Fig. 4 exemplifies the CSI of WLAN TGn channel models B-F without noise using MATLAB simulation. It can be observed that a large RMS delay causes severe fluctuations in CSI magnitudes, which makes the channel have strong frequency selectivity.

•

SNR: 5, 8, 10, 12, 13, 14, 15, 16, 17, 18, 19, 20 and 50 dB.

•

Transmission interval $\Delta t_{k}$ : 3 ms.

•

The moving speed $v_{0}$ : 1 m/s.

•

Distance between Bob and Mallory $d_{\rm bm}$ : 0.25, 0.5, 0.75, 1, 1.5, 2 and 3 wavelengths.

The training synthetic dataset will include all the configurations iterating different combinations of the above parameters.

V Siamese-based CSI Authentication

The CNN-based Siamese network is employed to learn the similarity between channel estimations.The architecture of the proposed Siamese model is illustrated in Fig. 5. The inputs are the magnitudes of two channel estimations, i.e., $|\widehat{\bm{H}}_{\rm ba}^{[k]}|$ and $|\widehat{\bm{H}}_{\rm ba}^{[k+1]}|$ , or $|\widehat{\bm{H}}_{\rm ba}^{[k]}|$ and $|\widehat{\bm{H}}_{\rm ma}^{[k+s1]}|$ . The Siamese-based model consists of two twin embedding networks and a similarity calculation module.

The CNN-based embedding network can be regarded as a feature extractor, which is composed of a normalization layer, two convolution layers, a flatten layer, and a dense layer. The input CSI estimations are normalized by min-max normalization. The convolution layers use 16 $M\times 7$ and 32 $M\times 7$ filters, respectively. Both of the convolution layers are activated by the ReLU function and padding is used. Two identical embedding networks act on two input CSI estimations to extract features, respectively. The initial parameters of the twin embedding networks are identical, so that features of the two input CSI estimations are extracted equivalently.

Each embedding network returns a $16\times 1$ feature vector, and the Euclidean distance between them is calculated. A score $s$ that measures the similarity between two CSI estimations is generated in the last dense layer with the sigmoid activation function, which produces a score $s$ between 0 and 1.

The contrastive loss function is adopted to train the Siamese model, defined as

[TABLE]

where $V_{i}$ denotes the label of the input CSI pair and $\eta=1$ is a margin parameter. It indicates that a pair of similar CSI estimations will make $s$ approach 0, while a pair of dissimilar CSI estimations will make $s$ approach 1.

The Siamese-based model architecture is built with Python 3.8 in Tensorflow, and the network is trained on 2 NVIDIA Tesla V100 GPUs using the RMSprop optimizer with a learning rate of 0.001 and a batch size of 32.

VI Simulation Evaluation

In this section, the simulation test dataset generation is first described, and then the benchmark algorithms are described, and eventually the performance of the proposed CNN-based Siamese model, FCN-bsed Siamese model and correlation-based method in simulation is evaluated and compared.

VI-A Simulation Test Dataset Generation

The simulation test dataset is generated in the same way as the synthetic training dataset introduced in Section IV-C with parameter configurations as follows.

•

Different multipath environments: WLAN TGn channel models B-F.

•

SNR: 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 and 22 dB.

•

Transmission interval $\Delta t_{k}$ : 3, 6, 9, 12, 15, 18, 21, 24 and 27 ms.

•

The moving speed $v_{0}$ : 1 m/s.

•

Distance between Bob and Mallory $d_{\rm bm}$ : 0.25, 0.5, 0.75, 1, 1.5, 2 and 3 wavelengths.

It is worth noting that unlike the training dataset, which contains all the different combinations of the parameter settings, the test datasets are generated based on each parameter setting independently, to represent a particular test scenario.

VI-B Benchmark Algorithms

Pearson correlation-based method in [22] and FCN-based Siamese method in [28] are used as benchmark algorithms. The Pearson correlation is calculated as (6) and the authentication is performed based on an empirically determined threshold, as shown in (7). The FCN-based Siamese method employs an embedding network with four fully connected layers to extract features from the two input CSI estimations. These layers have 256, 512, 256, and 16 neurons, respectively, with ReLU as the activation function for each.

VI-C Simulation Results

VI-C1 Comparison with Benchmark

Fig. 6 depicts the ROC curves of the CNN-based Siamese, FCN-based Siamese and correlation-based method for the simulation test dataset with WLAN TGn channel model B and model F.

Thanks to the prior knowledge of the channel learned in the training stage, the CNN-based Siamese and FCN-based Siamese models outperform the correlation-based method. Additionally, due to the stronger feature extraction performance of CNN compared to FCN, CNN-based Siamese network has an advantage over FCN-based Siamese network.

Table II reflects the computational overhead between CNN-based Siamese and FCN-bsed Siamese method in terms of parameter count and floating point operations (FLOPs). The parameter count refers to the total number of trainable parameters, including weights and biases, across all layers of the model. These parameters are updated during training to minimize the loss function. It can be obtained using the built-in model.summary() tool in TensorFlow. FLOPs represents the total number of floating point operations required for a single forward pass through the network. This can be calculated using the built-in tf.compat.v1.profiler() tool in TensorFlow. It can be observed that the CNN-based Siamese network has a lower computational overhead compared to the FCN-based Siamese network introduced in [28].

VI-C2 Evaluation Under Different Settings

Fig. 7 shows the AUC of the CNN-based Siamese network and correlation-based method versus SNR. It can be seen that the authentication performance of both methods gets better with the increasing SNR since the LS channel estimation will be more accurate with higher SNR. The Siamese-based method performs better than the correlation-based method on all SNRs and it works well in all channel models, which means that the CNN-based Siamese network has strong generalization performance for different channel environments. The reason is that by training the Siamese network, the feature extractor learns how to extract feature information from noisy CSI estimations and gets more robust to noise.

Fig. 8 shows the AUC of the CNN-based Siamese network and the correlation-based method versus the distance between Bob and Mallory normalized by wavelength $d_{\rm bm}/\lambda$ . It can be illustrated that with the distance between Bob and Mallory increasing, Alice has a better performance, i.e., higher AUC, detecting rogue devices. For the correlation-based method, the weak frequency selectivity makes it difficult for the receiver to detect rogue devices. However, as for the CNN-based Siamese network, it works very well for all channel models. The reason is that by training the Siamese network, the feature extractor learns how to extract feature information from the channel with weak frequency selectivity.

Fig. 9 shows the AUC of the CNN-based Siamese network and correlation-based method versus the transmission interval. It can be observed that the AUC decreases as the transmission interval $\Delta t_{k}$ increases because with a certain moving speed, the larger the transmission interval, the lower the correlation between the CSI estimations obtained from two consecutive legitimate packets, and the more difficult it is for Alice to detect the rogue device. Moreover, it also shows that with $\Delta t_{k}$ less than $15$ ms, the CNN-based Siamese network produces higher AUC than the correlation-based method for all channel models, which means that the CNN-based Siamese network has a higher tolerance for different transmission intervals.

VII Experimental Evaluation

We have verified the effectiveness of the CNN-based Siamese network in the simulation environment in Section VI. However, the simulation environment may not fully represent the practical environment. In addition, the synthetic training dataset needs to be compared with the experimental training dataset to prove the reliability of using the synthetic dataset. Therefore, experimental evaluation is also conducted. In this section, the experiment setup is first elaborated, and then the performance of the proposed synthetic training dataset is evaluated and compared, and eventually the threshold selection is discussed.

VII-A Experiment Setup

VII-A1 Device Configuration

Alice is configured as a WiFi access point (AP) while Bob and Mallory are treated as user stations, as shown in Fig. 10. An ESP32 kit with integrated WiFi connectivity is adopted as Alice. The collection of CSI from the ESP32 microcontroller is achieved by ESP32 CSI Toolkit444https://github.com/StevenMHernandez/ESP32-CSI-Tool, which can provide information about the working mode, MAC address of the transmitter, RSSI, noise floor, time stamp of the received signal, and CSI. The ESP32 board transfers the collected data to a PC via a USB cable. In addition, two LoPy4 development boards555https://development.pycom.io/tutorials/networks/wlan/ operating under the WiFi station mode are used as Bob and Mallory. All boards support IEEE 802.11n with a configuration of $20$ MHz bandwidth and $2.4$ GHz carrier frequency.

VII-A2 Experiment Scenarios

To evaluate the performance of the proposed scheme, we consider four different scenarios:

•

Scenario I: movement in a corridor A, as shown in Fig. 11(a).

•

Scenario II: movement in an office, as shown in Fig. 11(b) (moving route 1).

•

Scenario III, movement in a corridor B, as shown in Fig. 11(b) (moving route 2).

•

Scenario IV: movement in a residential apartment (floor plan not shown).

The experiments of scenario I and scenarios II & III were carried out on the second floor and the sixth floor of the Department of Electrical Engineering and Electronics, the University of Liverpool, UK, respectively.

It is worth noting that the datasets used for training and testing were independently collected in all scenarios.

VII-A3 Experimental Training Dataset Collection

Although we have generated the synthetic training dataset, we deliberately collected experimental training datasets for comparison. Training datasets were collected between Alice and Bob in all scenarios. In the experiments, Bob sent packets continuously from a fixed position at a time interval of $0.01$ s. Alice moved at a speed of approximately $0.25$ m/s while collecting measurement data using the ESP32 CSI toolkit.

Due to the movement by Alice, measurements taken at vastly different time instants correspond to those taken at widely separated locations, which corresponds to different devices. Therefore, in our experimental training dataset $\mathcal{D}_{\rm train}^{\rm E}=\{({\bm{U}}_{i}^{\rm E},V_{i}^{\rm E})\}_{i=1}^{|\mathcal{D}_{\rm E}|}$ , where each input ${\bm{U}}_{i}^{\rm E}\triangleq(\widehat{\bm{H}}^{[k]},\widehat{\bm{H}}^{[k+\Delta k]})$ corresponds to a pair of channel measurements separated by $\Delta k$ packets, we set $V_{i}^{\rm E}=0$ , when $\Delta k=1$ , treating the CSI measurements as those coming from the same device, and $V_{i}^{\rm E}=1$ , when $\Delta k=100$ , treating them as those coming from different devices. We collected $5145$ , $5037$ , $5077$ and $4240$ CSI measurements in scenario I, scenario II, scenario III and scenario IV, respectively.

VII-A4 Experimental Test Dataset Collection

In all scenarios, the positions of Bob and Mallory were fixed and separated by distance $d_{\rm bm}$ . Bob and Mallory continuously transmitted signals to Alice at time intervals of $0.01$ s and $0.1$ s, respectively. The distance between Bob and Mallory $d_{\rm bm}$ can take on the values of $3$ , $6$ , $9$ , $12$ , $18$ , $24$ and $36$ cm. By operating at 2.4 GHz, where the signal wavelength is about $12$ cm, the distances considered above correspond to $0.25$ , $0.5$ , $0.75$ , $1$ , $1.5$ , $2$ and $3$ wavelengths.

Alice moved at a speed of about $0.25$ m/s in all scenarios and received signals from both Bob and Mallory. Moreover, since Alice is able to extract the transmitter’s MAC address, we can utilize it to generate the ground truth identity of the received packets. It is worthwhile to note that, while the AP is usually fixed in practice, we consider the movement of the AP rather than the movement of user stations in order to control and adjust the distance between Bob and Mallory. Thanks to channel reciprocity, this setup is equivalent to fixed AP and mobile user stations.

VII-B Experiment Results

Fig. 12 shows the SNR distribution of the packets collected in scenarios I-IV for both training and test datasets. It can be seen that scenario II and scenario III are related to relatively high and low SNR environments, respectively. As shown in Fig. 11, in scenario II, there were always LOS transmissions between Alice and Bob/Mallory, and their distance was short, up to $7$ m. In contrast, there was only NLOS transmissions available in scenario III and the distance was as large as $15$ m. Scenario I and scenario IV represent the medium SNR environment.

Fig. 13 shows the ROC curves of our proposed CNN-based Siamese, FCN-based Siamese and correlation-based method for the scenario I (corridor A). CNN-based Siamese network and FCN-based Siamese network are trained on the synthetic training dataset and the experimental training dataset collected in scenario I (corridor A), respectively. It can be observed that the CNN-based Siamese network produces 0.03 higher AUC than FCN-based Siamese network and 0.05 higher AUC than the correlation-based method, respectively. That is, for each threshold, the CNN-based Siamese network has a higher TPR and a lower FPR than the FCN-based Siamese network and correlation-based method. More importantly, the Siamese model trained on the synthetic training dataset has the potential to yield higher AUC than the one trained on the experimental training dataset, which demonstrates the advantages of the synthetic dataset.

In Fig. 14, we show the AUC of the CNN-based Siamese network and correlation-based method versus the distance between Bob and Mallory normalized by wavelength $d_{\rm bm}/\lambda$ for all scenarios. The CNN-based Siamese models are trained on the synthetic training dataset and experimental training datasets collected in scenarios I-IV, respectively. We then evaluated these trained models against test datasets collected in different scenarios. We observe that in all scenarios, our CNN-based Siamese network obtains an average of 0.06 gain of AUC than the correlation-based method. Moreover, the synthetic training dataset potentially outperforms the experimental training dataset. Taking Fig. 14(a) as an example. When evaluated against the test dataset collected from scenario I, the training data collected in scenario I has better performance than the training datasets collected in other scenarios. As can be observed in Fig. 12, the training and test datasets collected in scenario I have similar SNR distributions. Other datasets have deviated SNR distributions, which worsens the generalization. For each test scenario, the synthetic training dataset has a comparable performance with the experimental training dataset collected in the matched scenario, which demonstrates good generalization performance across different scenarios.

In Fig. 15, we show the AUC of the CNN-based Siamese network and correlation-based method versus SNR. Specifically, the CSI measurements collected in all scenarios were mixed together, and then divided into 13 groups with a 5 dB interval length based on the SNR of the collected packets (ranging from 0 dB to 65 dB). The SNR of each group of CSI measurements is represented by the median SNR of the interval and each group is tested independently. It can be seen that with SNR lower than $20$ dB, the synthetic dataset always produces an AUC 0.05 higher than the correlation-based method. Moreover, the synthetic dataset performs well in all SNR groups, but the experimental datasets collected in scenarios I-IV perform well only in the corresponding SNR groups. Taking scenario I as an example, since the SNR of the packets collected in scenario I is mainly distributed between $10$ dB and $40$ dB, the experimental training dataset does not perform well with SNR lower than $10$ dB or higher than $40$ dB. It proves that setting different SNRs when generating the synthetic dataset can improve the robustness of the synthetic dataset to noise and make it work better in different SNR environments.

VII-C Threshold Selection

In practical applications, the threshold should be adapted to specific requirements. In applications with high security requirements, e.g., financial transaction systems, the primary goal is to prevent attackers from impersonating legitimate users, which requires a higher TPR. According to Fig. 13, for the proposed CNN-based Siamese model trained on the synthetic dataset, achieving a TPR of 0.95 corresponds to a threshold of 0.64. On the other hand, in user experience-focused applications, e.g., streaming services, the priority is to ensure a smooth experience for legitimate users and reduce the rejection of valid devices, which necessitates a lower FPR. As shown in Fig. 13, when FPR is set to 0.1, the corresponding threshold is 0.94.

In summary, selecting an appropriate threshold is crucial for balancing security and usability in authentication systems.

VIII Discussion

VIII-A Scalability Analysis

The proposed method can be efficiently scaled for large-scale deployment involving multiple devices. In scenarios with multiple attackers ( $\text{Mallory}_{1}$ , …, $\text{Mallory}_{n}$ ) and a legitimate device, as long as the distance between each attacker and the legitimate device exceeds half a wavelength, the packets sent by the attackers and the legitimate device will experience different channel fading. This allows Alice to detect the presence of the attackers based on the estimated CSI measurements. In cases with multiple legitimate users ( $\text{Bob}_{1}$ , …, $\text{Bob}_{n}$ ), Alice can recognize each legitimate user using the same model without the need for retraining.

Although we use the IEEE 802.11 legacy OFDM mode as the focus of our study, the proposed method is applicable to OFDM-based systems that support channel estimation. This makes the method scalable across a wide range of OFDM-based standards, including WiFi, LTE, and 5G.

VIII-B Advantage of Synthetic Dataset

The proposed synthetic training dataset offers two key advantages as follows.

•

Efficiency and cost-effectiveness: The synthetic dataset is generated through simulation, eliminating the need for extensive labor-intensive CSI collection. It significantly reduces time and resource requirements, making it more time-efficient and cost-effective.

•

Enhanced accuracy of channel generation: The synthetic dataset generates more accurate channel conditions than labor-collected datasets, including configurations for SNRs, attack distances, and mobility speeds. It enables the synthetic dataset to thoroughly capture the characteristics of different channel environments, thereby enhancing the generalization capability of the proposed method across different scenarios. As demonstrated in Fig. 14, across a range of scenarios, the synthetic dataset achieves comparable AUC with experimental datasets collected in matched scenarios.

IX Conclusion

In this paper, we proposed a novel deep learning-based physical layer CSI authentication for mobile scenarios, enhanced by a synthetic training dataset and CNN-based Siamese network. Specifically, a synthetic training dataset was generated to eliminate the overhead of manually collecting experimental datasets. The autocorrelation and the distance correlation of the channel were first modelled and the dataset was generated based on the WLAN TGn channel model. A CNN-based Siamese network was exploited to learn the temporal and spatial similarity between pairs of CSI estimations and a score can be obtained to measure the difference between the input CSI measurements. Device authentication was achieved by comparing the score to an empirically obtained threshold. A unique feature of this paper is a synergistic methodology involving both simulation and experimental evaluation. In particular, the simulation evaluation allowed us to tailor the synthetic dataset parameters. We explored the effect of SNR, the distance between legitimate and rogue devices and transmission interval on authentication performance. We then created a WiFi testbed consisting of an ESP32 kit and two LoPy4 boards and carried out extensive experiments in typical indoor environments. The experiment results demonstrate the reliability of the synthetic training dataset and the generalization of the proposed scheme. Compared with the FCN-based Siamese network and correlation-based benchmark algorithms, the proposed scheme obtains an average of 0.03 and 0.06 gain of AUC under practical test scenarios, respectively.

Bibliography33

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and M. Ayyash, “Internet of things: A survey on enabling technologies, protocols, and applications,” IEEE Commun. Surveys Tuts. , vol. 17, no. 4, pp. 2347–2376, 2015.
2[2] N. Wang, W. Li, P. Wang, A. Alipour-Fanid, L. Jiao, and K. Zeng, “Physical layer authentication for 5G communications: Opportunities and road ahead,” IEEE Network , vol. 34, no. 6, pp. 198–204, 2020.
3[3] N. Xie, Z. Li, and H. Tan, “A survey of physical-layer authentication in wireless communications,” IEEE Commun. Surveys Tuts. , vol. 23, no. 1, pp. 282–310, 2020.
4[4] T. M. Hoang, A. Vahid, H. D. Tuan, and L. Hanzo, “Physical layer authentication and security design in the machine learning era,” IEEE Commun. Surveys Tuts. , vol. 26, no. 3, pp. 1830–1860, 2024.
5[5] J. Yang, Y. Chen, and W. Trappe, “Detecting spoofing attacks in mobile wireless environments,” in Proc. IEEE Commun. Soc. Conf. Sensor, Mesh Ad Hoc Commun. Networks , Rome, Italy, 2009, pp. 1–9.
6[6] J. Yang, Y. Chen, W. Trappe, and J. Cheng, “Detection and localization of multiple spoofing attackers in wireless networks,” IEEE Trans. Parallel Distrib. Syst. , vol. 24, no. 1, pp. 44–58, 2013.
7[7] Y. Chen, J. Yang, W. Trappe, and R. P. Martin, “Detecting and localizing identity-based attacks in wireless and sensor networks,” IEEE Trans. Veh. Technol. , vol. 59, no. 5, pp. 2418–2434, 2010.
8[8] L. Xiao, L. Greenstein, N. Mandayam, and W. Trappe, “Fingerprints in the Ether: Using the physical layer for wireless authentication,” in Proc. IEEE Int. Conf. Commun. (ICC) , Glasgow, UK, 2007, pp. 4646–4651.