Machine Learning and Location Verification in Vehicular Networks
Ullah Ihsan, Robert Malaney, Shihao Yan

TL;DR
This paper introduces a machine learning-based location verification system for vehicular networks that outperforms traditional information-theoretic methods by adapting to changing environments using real-world data.
Contribution
The work presents the first ML-LVS that does not rely on channel parameters and demonstrates its effectiveness with real-world RSS data in vehicular networks.
Findings
ML-LVS outperforms traditional LVSs in real-world scenarios
Effective even against sophisticated adversaries
Adapts to changing environments without channel knowledge
Abstract
Location information will play a very important role in emerging wireless networks such as Intelligent Transportation Systems, 5G, and the Internet of Things. However, wrong location information can result in poor network outcomes. It is therefore critical to verify all location information before further utilization in any network operation. In recent years, a number of information-theoretic Location Verification Systems (LVSs) have been formulated in attempts to optimally verify the location information supplied by network users. Such LVSs, however, are somewhat limited since they rely on knowledge of a number of channel parameters for their operation. To overcome such limitations, in this work we introduce a Machine Learning based LVS (ML-LVS). This new form of LVS can adapt itself to changing environments without knowing the channel parameters. Here, for the first time, we use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Machine Learning and Location Verification in Vehicle Networks
Ullah Ihsan1, Robert Malaney1 and Shihao Yan2
1School of Electrical Engineering & Telecommunications, The University of New South Wales, Sydney, NSW 2052, Australia
2School of Engineering, Macquarie University, Sydney, NSW 2109, Australia
Machine Learning and Location Verification in Vehicular Networks
Ullah Ihsan1, Robert Malaney1 and Shihao Yan2
1School of Electrical Engineering & Telecommunications, The University of New South Wales, Sydney, NSW 2052, Australia
2School of Engineering, Macquarie University, Sydney, NSW 2109, Australia
Abstract
Location information will play a very important role in emerging wireless networks such as Intelligent Transportation Systems, 5G, and the Internet of Things. However, wrong location information can result in poor network outcomes. It is therefore critical to verify all location information before further utilization in any network operation. In recent years, a number of information-theoretic Location Verification Systems (LVSs) have been formulated in attempts to optimally verify the location information supplied by network users. Such LVSs, however, are somewhat limited since they rely on knowledge of a number of channel parameters for their operation. To overcome such limitations, in this work we introduce a Machine Learning based LVS (ML-LVS). This new form of LVS can adapt itself to changing environments without knowing the channel parameters. Here, for the first time, we use real-world data to show how our ML-LVS can outperform information-theoretic LVSs. We demonstrate this improved performance within the context of vehicular networks using Received Signal Strength (RSS) measurements at multiple verifying base stations. We also demonstrate the validity of the ML-LVS even in scenarios where a sophisticated adversary optimizes her attack location.
I Introduction
Vehicular Ad-hoc Networks (VANETs) are a particular type of Intelligent Transportation System (ITS) which utilize communications to assist with various traffic problems. VANETs can function based on vehicle-to-vehicle communication and/or vehicle-to-Road Side Unit (RSU) communication[1]. RSUs are fixed base stations installed at certain locations with an aim to assist VANETs with their operations. An RSU (or a trusted vehicle whose location is a priori verified), can also function as a Processing Center (PC). The PC processes the communication data before issuing instructions to the vehicles under its coverage area.
Location information of vehicles is a key ingredient for VANETs. The vehicles usually obtain their location information through Global Navigation Satellite System (GNSS) and/or Global Positioning System (GPS), and report this information to the PC for use in subsequent network operations. A possibility exists where the supplied location information from the vehicle has errors in it. This may be due to some faulty hardware used in recording/forwarding the location information, or it may be due to a vehicle falsifying its location information (in order to have advantage over nearby vehicles or to simply disrupt the network). If the location information supplied by the vehicle is not verified, and the location error goes unnoticed, this may result in poor network outcomes such as traffic queues, traffic congestion, or poor tolling. In extreme cases, a lack of position verification may lead to catastrophic situations such as vehicle collisions.
In recent years, a number of Location Verification Systems (LVSs)[2, 3, 4, 5, 6, 7, 8, 9, 10, 11] have been devised to validate the vehicle’s supplied location information. These LVSs in general make use of the numerous physical layer properties of the signal (transmitted by the vehicle and measured at the verifying base stations) to verify the vehicle’s reported location information. The physical layer properties include Received Signal Strength (RSS), Time of Arrival (ToA) of the signal, and Angle of Arrival (AoA) of the signal. However, all LVSs have a serious limitation in their operation - they normally operate efficiently only for the channel conditions assumed at the time of their design[2]. That is, they normally only function well under the assumption that all a priori channel information provided to them remains accurate. Further, they are only able to efficiently address the threat-model scenarios they have been specifically designed for[12]. Such limitations make their real-world deployment suspect.
Machine Learning (ML) is an important technology which is now impacting many applications e.g.,[13, 14, 15, 16, 17, 18, 19], and it is possible that inclusion of ML techniques may help resolve some of the LVS limitations mentioned above. Indeed, this has been shown to be the case in theoretical simulations of LVSs in the context of ToA schemes[20], and in theoretical simulations of ‘in-region’ location verification[21]. What remains to be determined is whether these advances hold up under conditions where real-world data is input to the ML-LVS. In this work, which represents the first experimental deployment of any ML-LVS, we answer this question in the affirmative. We summarize below our main contributions.
- •
We carry out for the first time an ML-LVS analysis based on real-world data, namely RSS measurements.
- •
We show that our ML-LVS outperforms an information-theoretic LVS when a malicious vehicle sets its claimed (untrue) location at some random location.
- •
We also show that unlike the information theoretic LVS, the ML-LVS still performs efficiently even when the malicious vehicle formally minimizes spoofing detection by optimizing its claimed (untrue) location.
The remainder of this paper is organized as follows. Section II details the system model. Section III presents the performance analysis using information theory and ML techniques. Section IV provides numerical results and future prospects, and Section V concludes the paper.
II System Model
We consider the following system model in our work:
The true location of a legitimate or malicious vehicle is denoted by . 2. 2.
We refer to the reported location from a legitimate or malicious vehicle as the claimed location, which is denoted by . The claimed location for a legitimate vehicle is exactly the same as its true location. On the other hand, a malicious vehicle spoofs its location, i.e., its claimed which is not the same as its true location. 3. 3.
For a malicious vehicle , where is an a priori distance representing the minimum distance between its claimed and true locations. 4. 4.
The framework consists of RSUs as verifying base stations, with publicly known true locations. All RSUs are in the transmission range of the vehicles (whose claimed locations have to be verified). The true location of the i-th RSU is where . 5. 5.
We choose one of the RSUs as PC. The PC accumulates its own RSS measurements with the measurements collected by other RSUs for further processing. The PC decides on the integrity of a vehicle’s claimed location. 6. 6.
Under the null hypothesis , the vehicle is legitimate, i.e., we have
[TABLE] 7. 7.
Under the alternative hypothesis , the vehicle is malicious, i.e., we have
[TABLE]
Based on a log-normal pathloss model, under , the RSS (all RSS in dBm) measured by the i-th RSU from a legitimate vehicle, , is given by
[TABLE]
where is a zero mean normal random variable with variance representing the channel noise, and is the mean RSS at i-th RSU. This latter quantity is given by
[TABLE]
where is a reference RSS at a reference distance , is the path loss exponent, and is the distance of a legitimate vehicle’s true location to the i-th RSU, given by
[TABLE]
The measurements made by the N RSUs are independent of each other. Under , they collectively form a vector . Based on (3) the vector y follows a multi-variate normal distribution given as
[TABLE]
where is the mean RSS vector under , and is the covariance matrix with I as the identity matrix.
Under , a malicious vehicle spoofs its claimed location. It reports its claimed location to be at a minimum distance away from his true location. As an example scenario - we can think of the malicious vehicle pretending to be on the road while it actually is placed off in a nearby street. The RSS value measured by the i-th RSU from a malicious vehicle, , is given by
[TABLE]
where is given by
[TABLE]
and is the distance of its true location to the i-th RSU, given by
[TABLE]
The measurements made by N RSUs are independent of each other. Under , they collectively form a vector . From (6), vector y follows a multi-variate normal distribution given as
[TABLE]
where is the mean RSS vector under .
III Performance Analysis
The outcome of an LVS is a binary result i.e. legitimate or malicious. This is different from a localization system where the output is an estimated location. We measure the performance of our LVS using two methodologies; through information theoretic analysis similar to [22] and, through the newly designed ML-LVS method which makes use of machine-learning techniques. In both the cases, a Bayes average cost function is chosen as the performance metric for LVS in terms of ‘Total Error’. The Total Error is given by
[TABLE]
where and are the a priori probabilities of occurrences of (i.e. legitimate vehicle) and (i.e. malicious vehicle), respectively. In this work, we assume the legitimate and the malicious vehicles in equal proportions so both and are equal to 0.5. represents the False Positive Rate (the rate of legitimate vehicles being detected incorrectly) and represents the Detection Rate (the rate of malicious vehicles being detected correctly). Equation (9) therefore takes the form
[TABLE]
III-A Information-theoretic LVS
We will refer to the information-theoretic analysis as the Likelihood Ratio Test (LRT) method from now on. The LRT method requires some parameters and channel information to be available in advance. This information includes the pathloss exponent , the mean RSS vectors as highlighted in the system model, and the LRT decision threshold , It has been proven elsewhere that the LRT method achieves the optimum detection results for a given false positive rate[23]. This leads to the conclusion that the LRT minimizes the Total Error and maximizes the mutual information between input and output of the LVS[24]. We follow decision rule given below for the LRT method
[TABLE]
where is the likelihood ratio, and and are the binary decision values (i.e., whether the vehicle is legitimate or malicious), while , and are given by
[TABLE]
[TABLE]
where is determinant of . The decision rule given in (11) can be reformulated as
[TABLE]
We assume that the malicious vehicle optimizes its claimed location. That is, through an optimization strategy, it minimizes its probability of being detected by the LVS. We assume in this work that the malicious vehicle’s optimum claimed location is constrained to be within the transmission range of the RSUs. To optimize its claimed location under such a constraint, the malicious vehicle minimizes the KL divergence between to [25]. This divergence is as given below
[TABLE]
Then, the optimal claimed location for the malicious vehicle can be obtained through
[TABLE]
III-B ML-LVS
This section highlights the novel approach used to design a classification framework for the verification of a vehicle’s claimed location through supervised ML techniques. Feed-forward neural networks are well known for their performance in classification problems. We use a multi-layer feed-forward neural network for the binary classification of a vehicle as either legitimate or malicious.
The framework considers y (the RSS observation vector measured in the field) and the vehicle’s claimed location as inputs. Based on a series of trials with changing architectures for the ML-LVS, we decided upon a framework that has the raw inputs (RSS, claimed locations, and RSUs locations), a 10-neuron hidden layer, and a 1-neuron binary output layer. We also experimented with different transfer functions in various layers of the ML-LVS. The results shown in the next section adopted the hyperbolic tangent-sigmoid transfer function in the hidden layer and the linear transfer function in the output layer. The ML-LVS utilized the Levenberg-Marquardt as its backpropagation algorithm.
IV Numerical Results
RSS measurements from the vehicles were collected in a 150 X 150 meters area by 3 RSUs (an area that mimics a wide cross section of 2 highways). 3 devices were used as 3 RSUs to independently measure the RSS from the vehicles in the field at a frequency of 1 Hz simultaneously, i.e., one RSS measurement per second per RSU. The origin of the area is set to the location of RSU-1 as shown in Fig. 1. Moving Wi-Fi modems with a single antenna and an attached GPS (used to record the vehicle’s location at a frequency of 1 Hz) was used to mimic slow-moving vehicles. The GPS locations of these ‘vehicles’ are reported to the RSUs every second. The RSS measurements by individual RSUs and the vehicles’ GPS locations were combined with the help of time stamps (available with both the measured RSS and the vehicles’ GPS locations).
The pathloss exponent is required for the LRT, and is determined directly from the field measurements via a linear fit of the measured RSS values against the logarithm of the distance to a RSU. u and v are calculated using (4) and (7) under the corresponding hypothesis. is calculated using the mean RSS vector and the RSS measurements (made by each RSU).
The RSS measurements data is randomized and equally divided it into two halves with one half representing the legitimate vehicles and the other half representing the malicious vehicles. To launch a location-spoofing attack, the malicious vehicles spoof their locations by a minimum distance of meters away from their true locations. Random claimed locations for the malicious vehicles are simulated by taking into account the distance constraint . Fig. 1 highlights true and simulated random claimed locations for a sample of the malicious vehicles.
We now present some numerical results based on our analysis from the LRT and ML-LVS. In Fig. 2, we assume that the malicious vehicles randomly forge their claimed locations at a minimum distance away from their true locations and within the transmission range of the RSUs. The Total Error is plotted against the number of training data used. For the LRT based LVS, we calculate the Total Error, the false positive rate, and the detection rate under different values of using (10) and (14). The Total Error for equal to 100m, 75m and 50m, is 0.05, 0.22, and 0.29, respectively (different colored-dashed arrows).
The data considered for the LRT based LVS in Fig. 2 is also considered for the ML-LVS. Unlike the LRT method where the LVS requires a priori information for the channel parameters, the ML-LVS only uses the measured RSS (at the RSUs) and the vehicles’ reported claimed locations. This data which has genuine and malicious vehicles in equal proportions is randomized and divided into two data sets; a training set with 80% of the entire data, and a test set with the remaining 20% of the data. The training set also has data labels (genuine or malicious). These data labels indicate whether particular training sample represents a legitimate or malicious vehicle. Use of such data is required to set the weights and biases for the ML-LVS in the training phase. On the other hand, the data in the test set has no such labels which means that we have no a priori information if a particular sample belongs to a legitimate or a malicious vehicle. Once trained, the ML-LVS can be used to test the data in the test set for classification of the vehicles.
In the training phase in Fig. 2, we supply the ML-LVS with training samples from the training data at a rate of one random training sample per unit time and plot the Total Error for the test set after each unit time. The ML-LVS’s backpropagation algorithm terminates the training phase once a threshold for any of its internally set parameters is met. We observe that in most cases the ‘maximum validation failures’ parameter of the backpropagation algorithm (the maximum number of sequential iterations in which the ML-LVS’s performance fails to improve) is reached, and this terminates the training phase. We set this parameter to 6. This trained ML-LVS is then used to classify vehicles in the test set as either legitimate or malicious. This procedure is repeated for each value of the training data shown in Fig. 2. As shown in Fig. 2, as expected, the Total Error for the test set improves as the training continues. The final Total Error for the test set (after 500 training samples) using the ML-LVS for equal to 100m, 75m and 50m, is 0.01, 0.02, and 0.06, respectively. It is evident from Fig. 2 that the ML-LVS with no a priori channel information has much-improved performance relative to the LRT based LVS.
We now assume that the malicious vehicles can overhear the communication between the legitimate vehicles and the RSUs. The malicious vehicles use this information to best optimize their claimed locations () prior launching a location-spoofing attack. That is, they set their claimed location using (15) so as to minimize their probability of being marked malicious by the LVS.
In Fig. 3 we compare the performances of the ML-LVS and the LRT based LVS. We see again that the ML-LVS still outperforms the LRT based LVS. However, we notice a rather counter-intuitive finding where, compared to Fig. 2, the Total Error for the ML-LVS improves much faster. This counter intuitive finding is as a result of the geometry of the RSUs in this specific experiment. This geometry leads to a clustering in the malicious vehicles’ claimed location settings. In general (i.e. more general RSU geometries), if the malicious vehicles’ optimize their claimed locations, the Total Error for the ML-LVS is expected to take longer to reach its asymptotic value.
In future work we plan to integrate Support Vector Machines (SVM) into the designed neural-network framework of our ML-LVS. We also plan to deploy this modified ML-LVS in more complex channel fading environments such as those possessing Rician fading channels. These additional studies are likely to provide for even more performance gains in ML-LVSs relative to LRT based LVSs.
V Conclusion
Information-theoretic LVS frameworks, due to their operating limitations, are not practical in many real-world scenarios. To address this gap, we have proposed the use of a ML approach to location verification. This new approach is particulary useful since unlike an information-theoretic LVS, a ML-LVS does not require a priori information on the channel parameters. Additionally, a ML-LVS can adapt itself to any changing channel conditions.
Using real-world RSS data, we have shown for the first time how a deployed ML-LVS outperforms state-of-the-art information-theoretic LVS. Further, we have shown how this result holds even when the adversary optimizes its attack location. Future work in this area will help us develop a fully robust state-of-the-art artificially intelligent LVS, an LVS which will be wholly practical in terms of its location verification performance in a wide range of future wireless networks beyond the networks we have studied here.
We believe the novel approach for enhancing the performance of real-world LVSs that we have developed here potentially forms the foundation for all future works in the important area of wireless location verification.
VI Acknowledgment
The authors acknowledge support by the University of New South Wales, Australia, and Macquarie University, Australia. Ullah Ihsan acknowledges financial support from the Australian Government through its Research Training Program.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] H. Hartenstein and L. Laberteaux, “A tutorial survey on vehicular ad hoc networks,” IEEE Communications Magazine , vol. 46, no. 6, pp. 164–171, Jun. 2008.
- 2[2] S. Yan and R. Malaney, “Location verification systems in emerging wireless networks,” ZTE Comms. , vol. 11, no. 3, pp. 03–10, Jul. 2013.
- 3[3] D. Sheet, O. Kaiwartya et al. , “Location information verification using transferable belief model for geographic routing in vehicular ad hoc networks,” IET Intelligent Transportation Systems , vol. 11, no. 2, pp. 53–60, Mar. 2017.
- 4[4] S. Yan, R. Malaney, I. Nevat, and G. W. Peters, “Location verification systems for VANE Ts in Rician fading channels,” IEEE Transactions on Vehicular Technology , vol. 65, no. 7, pp. 5652–5664, Jul. 2016.
- 5[5] P. Monteiro, J. Rebelatto, and R. Souza, “Information-theoretic location verification system with directional antennas for vehicular networks,” IEEE Transactions on Intelligent Transportation Systems , vol. 17, no. 1, pp. 93–103, Jan. 2016.
- 6[6] S. Yan, I. Nevat, G. W. Peters, and R. Malaney, “Location verification systems under spatially correlated shadowing,” IEEE Transactions on Wireless Communications , vol. 15, no. 6, pp. 4132–4144, Jun. 2016.
- 7[7] F. Malandrino, C. Casetti, C. Chiasserini, M. Fiore, R. Yokoyama, and C. Borgiattino, “A-VIP: Anonymous verification and inference of positions in vehicular networks,” in Proceedings of the IEEE INFOCOM , Apr. 2013, pp. 105–109.
- 8[8] W. Jaballah, M. Conti, M. Mosbah, and C. Palazzi, “Secure verification of location claims on a vehicular safety application,” in Proceedings of the International Conference on Computer Communication and Networks (ICCCN) , Aug. 2013, pp. 1–7.
