Indoor Localization for IoT Using Adaptive Feature Selection: A Cascaded   Machine Learning Approach

Mohamed I. AlHajri; Nazar T. Ali; Raed M. Shubair

arXiv:1905.01000·eess.SP·January 8, 2020

Indoor Localization for IoT Using Adaptive Feature Selection: A Cascaded Machine Learning Approach

Mohamed I. AlHajri, Nazar T. Ali, Raed M. Shubair

PDF

TL;DR

This paper introduces a cascaded machine learning method for indoor IoT localization that adaptively selects RF features, significantly improving accuracy by 50-70% through environment identification and optimal feature combination.

Contribution

The paper proposes a novel two-stage machine learning approach that adaptively identifies indoor environments and selects RF features for enhanced localization accuracy.

Findings

01

Localization accuracy improved by at least 50%.

02

Combining RF features enhances prediction performance.

03

Environment identification guides optimal feature selection.

Abstract

Evolving Internet-of-Things (IoT) applications often require the use of sensor-based indoor tracking and positioning, for which the performance is significantly improved by identifying the type of the surrounding indoor environment. This identification is of high importance since it leads to higher localization accuracy. This paper presents a novel method based on a cascaded two-stage machine learning approach for highly-accurate and robust localization in indoor environments using adaptive selection and combination of RF features. In the proposed method, machine learning is first used to identify the type of the surrounding indoor environment. Then, in the second stage, machine learning is employed to identify the most appropriate selection and combination of RF features that yield the highest localization accuracy. Analysis is based on k-Nearest Neighbor (k-NN) machine learning…

Figures7

Click any figure to enlarge with its caption.

Tables2

Table 1. TABLE I : Comparison of localization distance error, RMSE, for different types of indoor environments using k 𝑘 k -NN algorithm ( k 𝑘 k = 1) based on various primary and hybrid RF features.

	Primary RF Features			Hybrid RF Features
	RSS	CTF	FCF	RSS + CTF	RSS + FCF	CTF + FCF	RSS + CTF + FCF	$α$
A: Lab	109.19 cm	39.75 cm	49.56 cm	39.99 cm	100.25 cm	39.68 cm	39.9 cm	63.7%
B: Narrow Corridor	105.3 cm	30.53 cm	45.89 cm	31.16 cm	97.94 cm	30.49 cm	30.71 cm	71.0%
C: Lobby	106.25 cm	27.52 cm	38.9 cm	27.84 cm	94.98 cm	27.18 cm	27.82 cm	74.4%
D: Sports Hall	111.3 cm	63.33 cm	55.2 cm	63.91 cm	103.52 cm	63.32 cm	63.9 cm	50.4%

Table 2. TABLE II : Comparison of localization distance error, RMSE, using the proposed method and other methods in the literature.

[10]

[38]

[28]

[18]

Proposed

A: Lab

109.19 cm

108.76 cm

49.56 cm

39.75 cm

39.68 cm

B: Narrow Corridor

105.3 cm

104.3 cm

45.89 cm

30.53 cm

30.49 cm

C: Lobby

106.25 cm

105.1 cm

38.9 cm

27.52 cm

27.18 cm

D: Sports Hall

111.3 cm

110.9 cm

55.2 cm

63.33 cm

55.2 cm

Average Performance

108.0 cm

107.3 cm

47.4 cm

40.3 cm

38.1 cm

Equations8

H (f) = l = 1 \sum L a_{l} exp [- j (2 π f τ_{l} - θ_{l})]

H (f) = l = 1 \sum L a_{l} exp [- j (2 π f τ_{l} - θ_{l})]

R (f) = - \infty \int \infty H (\hat{f}) H^{*} (\hat{f} + f) d \hat{f}

R (f) = - \infty \int \infty H (\hat{f}) H^{*} (\hat{f} + f) d \hat{f}

d (s_{m}, s_{n}) = ∥ s_{m} - s_{n} ∥_{2} = i = 1 \sum K [s_{m} (i) - s_{n} (i)]^{2}

d (s_{m}, s_{n}) = ∥ s_{m} - s_{n} ∥_{2} = i = 1 \sum K [s_{m} (i) - s_{n} (i)]^{2}

α = (\frac{RMSE _{RSS} - RMSE _{β}}{RMSE _{RSS}}) \times 100%

α = (\frac{RMSE _{RSS} - RMSE _{β}}{RMSE _{RSS}}) \times 100%

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Indoor Localization for IoT Using Adaptive Feature Selection: A Cascaded Machine Learning Approach

Mohamed I. AlHajri∗,‡, Nazar T. Ali†, Raed M. Shubair‡

**

∗Electrical Engineering and Computer Science, Massachusetts Institute of Technology, USA

*†*Electrical and Computer Engineering, Khalifa University, UAE

*‡*Research Laboratory of Electronics, Massachusetts Institute of Technology, USA

Emails: $\{$ malhajri; rshubair $\}$ @mit.edu; [email protected]

Indoor Localization for IoT Using Adaptive Feature Selection:

A Cascaded Machine Learning Approach

Mohamed I. AlHajri∗,‡, Nazar T. Ali†, Raed M. Shubair‡

**

∗Electrical Engineering and Computer Science, Massachusetts Institute of Technology, USA

*†*Electrical and Computer Engineering, Khalifa University, UAE

*‡*Research Laboratory of Electronics, Massachusetts Institute of Technology, USA

Emails: $\{$ malhajri; rshubair $\}$ @mit.edu; [email protected]

Abstract

Evolving Internet-of-Things (IoT) applications often require the use of sensor-based indoor tracking and positioning, for which the performance is significantly improved by identifying the type of the surrounding indoor environment. This identification is of high importance since it leads to higher localization accuracy. This paper presents a novel method based on a cascaded two-stage machine learning approach for highly-accurate and robust localization in indoor environments using adaptive selection and combination of RF features. In the proposed method, machine learning is first used to identify the type of the surrounding indoor environment. Then, in the second stage, machine learning is employed to identify the most appropriate selection and combination of RF features that yield the highest localization accuracy. Analysis is based on $k$ -Nearest Neighbor ( $k$ -NN) machine learning algorithm applied on a real dataset generated from practical measurements of the RF signal in realistic indoor environments. Received Signal Strength, Channel Transfer Function, and Frequency Coherence Function are the primary RF features being explored and combined. Numerical investigations demonstrate that prediction based on the concatenation of primary RF features enhanced significantly as the localization accuracy improved by at least 50% to more than 70%.

Index Terms:

Indoor Localization, IoT, Machine Learning, $k$ -Nearest Neighbor, Received Signal Strength, Channel Transfer Function, Frequency Coherence Function.

I Introduction

The advent of Internet-of-Things (IoT) has revolutionized the field of telecommunications opening the door for unprecedented applications such as [1, 2, 3, 4, 5, 6, 7, 8, 9]. These applications are enabled through smart sensors that cooperate efficiently to provide the desired service. The efficient deployment of IoT-based sensors primarily depends on accurate localization and adjustment of sensor power consumption according to the radio frequency (RF) propagation channel which is dictated by the type of the surrounding indoor environment.

The complexity of the indoor wireless channel, due to the severe multipath effect and absence of Line-of-Sight (LOS) path, emphasizes the need for a sophisticated yet efficient approach for indoor localization. Location fingerprinting based on various machine learning algorithms has been reported in the literature using different RF features [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]. These have included Received Signal Strength (RSS), Channel Transfer Function (CTF), and Frequency Coherence Function (FCF). Developing a machine learning approach that is based on a hybrid combination of such primary RF features can greatly enhance localization accuracy, since more information will be available to the algorithm. Such an approach does not exist in the literature.

This paper presents a novel method based on a cascaded two-stage machine learning approach for highly-accurate and robust localization in indoor environments using adaptive selection and combination of RF features. In the proposed method, and based on the authors prior work [24, 25], machine learning is first used to identify the type of the indoor environment based on real data measurements of the RF signal in different indoor scenarios. Then, in the second stage, machine learning is employed to identify the most appropriate selection and combination of RF features that yield the highest localization accuracy. The most appropriate selection and combination of RF features of the second stage is dependent on the type of indoor environment, which is being identified from the first stage. Analysis is based on $k$ -Nearest Neighbor ( $k$ -NN) machine learning algorithm applied on a real dataset generated from practical measurements of the RF signal in realistic indoor environments. Received Signal Strength, Channel Transfer Function, and Frequency Coherence Function are the primary RF features being explored and combined.

II Signatures of Indoor Environment

II-A Primary RF Features

In a multipath rich indoor environment, the received signal $Y(f)$ contains replicas of the RF transmitted signal $X(f)$ . The ratio of $Y(f)$ to $X(f)$ defines the CTF, $H(f)$ , which contains the multipath effects of the wireless channel. Hence, $H(f)$ can be represented as the superposition of the gains associated with the multipath components as follows [26]:

[TABLE]

where $a_{l}$ , $\tau_{l}$ , and $\theta_{l}$ are the amplitude, delay, and phase of the $l^{th}$ multipath component; $L$ is the total number of multipath components; and $f$ is a frequency within bandwidth of operation of the channel. The analysis in this paper considers an indoor scenario with a high Signal-to-Noise Ratio (SNR)[27, 28]. The CTF, $H(f)$ , is considered as an RF feature that would be distinctively unique for every spatial position within the indoor environment. Under frequency selective fading, this RF signature becomes more sensitive to channel variations due to the rapid fluctuations of the gains associated with its multipath components.

The complex autocorrelation of CTF, $H(f)$ , defines another channel metric known as FCF, $R(f)$ , given by [18]:

[TABLE]

FCF given in (2) represents the frequency domain coherence of the radio channel. It is interpreted as the autocorrelation of the CTF, $H(f)$ , due to different frequency shifts. FCF is known for its slow varying nature in the spatial domain[18], which makes it a strong candidate as an RF signature.

II-B Hybrid RF Features

Characterizing the type of an indoor environment by a machine learning algorithm becomes a more accurate process when the algorithm makes decisions based on concatenation of primary RF features of the same environment. In such a way, the accuracy of prediction increases due to the wealth of information available to the algorithm from the multi-dimensional dataset constructed from the concatenation of the combined primary RF features of the same environment. Hence, the approach is based on hybrid features. The investigations were carried out using multi-dimensional information generated from various possible combinations of all or subset of RSS, CTF, and FCF which form the hybrid features. Results, presented later in section 5, based on these hybrid features show indeed a substantial improvement in the algorithm prediction resulting in a significant enhancement in environment separability and localization accuracy.

Uniform Manifold Approximation and Projection (UMAP) is a non-linear dimensionality reduction technique that is well-suited for embedding high-dimensional data for visualization in a low-dimensional space [29]. In this paper, we mapped the 5-D hybrid RF feature vector (RSS + CTF + FCF) into 2-D, with results visualized in Fig. 1. It should be noted that the hybrid RF feature vector (RSS + CTF + FCF) is 5-D since it is the concatenation of the real-valued RSS followed by the real and imaginary parts of CTF and FCF, respectively. Fig. 1 indicates that the resulting 2-D mapped hybrid RF features vector of the Sports Hall (open space) is closest to that of the Main Lobby (low cluttered). Fig. 1 also indicates that the 2-D mapped hybrid RF features vector of the Narrow Corridor (medium cluttered) is surrounded by that from the Main lobby (low cluttered) and Lab (highly cluttered). The 2-D embedding of the mapped RF features in Fig. 1 shows that further insights can be gained from the UMAP visualization to confirm the advantage of using the concept of hybrid RF features.

III Dataset Generation from Real Measurements

A dataset has been generated through a measurement campaign in which readings of the RF signal were taken in realistic indoor environments of four different types within Khalifa University campus, Sharjah, UAE. The frequency domain CTF, $H(f)$ , was obtained for four indoor environments: highly cluttered (Laboratory), medium cluttered (Narrow Corridor), low cluttered (Lobby), and open space (Sports Hall).

The measurement setup was similar to that used in [25] has been utilized. The dataset generated for each environment consists of 196 grid points $\times$ 10 iterations/(grid point). The dataset has been divided into 75% training dataset and 25% testing dataset. The dataset has been made available as open-access online so that it can be utilized for further explorations. It can be accessed online as per details in [30, 17].

IV Machine Learning Algorithm

The machine learning algorithm adopted in this paper is $k$ -Nearest Neighbor ( $k$ -NN) which is a non-parametric method used for classification [31, 32]. In this algorithm, the $k$ neighbors of each hybrid RF features vector will determine the type of environment. The $k$ -neighbors are associated with the shortest $k$ Euclidean distances, which are defined as follows [10]:

[TABLE]

where $K$ is the length of the RF features vector, $\mathbf{s}_{m}$ and $\mathbf{s}_{n}$ are the vectors of hybrid RF features at positions $m$ and $n$ , respectively.

V Cascaded Machine Learning Approach

V-A Machine Learning for Indoor Environment Identification

This section is based on prior work by the authors [25, 24] where a machine learning approach was used for indoor environment classification based on real measurements of the RF signal. Extensive numerical investigations [25, 24] showed that a machine learning algorithm using $k$ -NN method, utilizing a hybrid combination of CTF and FCF, outperforms other methods such as Decision Tree [33] and Support Vector Machine (SVM) [34] in identifying the type of the indoor environment with a classification accuracy of 99.3%. The required time was found to be less than 10 $\mu$ s, which verifies that the adopted $k$ -NN machine learning algorithm is a successful candidate for real-time IoT deployment scenarios. The accuracy of the adopted technique is represented in terms of the confusion matrix, shown in Fig. 2, for which the diagonal elements represent the percentage of accurate prediction for each type of environment. The off-diagonal elements of the confusion matrix represent the percentage of misclassification. The results in Fig. 2 have been obtained for 490 test cases per environment.

V-B Machine Learning for Localization Position Estimation

Studies in the literature indicate that a substantial gain in accuracy is achieved by improving the construction of the RF signature, rather than the use of more complex algorithms [13, 35, 36, 37]. As a result, in the second stage and after the type of indoor environment has been identified, a specific selection and combination of RF feature is utilized based on the identified indoor environment. We have adopted the $k$ -NN algorithm again in the second stage in order to estimate the sensor position by comparing the testing RF feature to the training database of RF features, with the purpose of finding the best matching entry.

Table I shows the localization distance error, RMSE, using various primary and hybrid RF features. It can be seen from that the choice of the best RF signature is dependent on the type of the indoor environment. In the case of the Lab, Narrow Corridor, and Lobby, a hybrid RF feature combining CTF + FCF produced the least error RMSE, whereas for the case of the Sports Hall, a primary RF feature based on FCF has the best performance. These findings clearly indicate the importance of incorporating information on the type of the indoor environment for improving location estimation.

In order to quantitatively assess the improvement in localization estimation when the proposed method is used, we define $\alpha$ (last column in Table I) to represent the percentage reduction in localization distance error, RMSE, when the proposed method is used compared to the baseline method which uses RSS as the RF feature:

[TABLE]

where $\text{RMSE}_{\text{RSS}}$ is the distance error due to RSS considered to be the baseline method, and $\text{RMSE}_{\beta}$ is the distance error due to the RF feature that produces the lowest RMSE. It is evident from the computed values of $\alpha$ as given in the last column in Table I, that the proposed method significantly reduces the percentage error in localization by at least 50% to more than 70%.

We have also studied the effect of varying the number of neighbors, $k$ , on the performance of different RF signatures in order to decide, for every type of environment, the best combination of primary features that yield the highest localization accuracy. The results in Fig. 3 - 5 indicate that increasing $k$ reduces the accuracy since the localization distance error, RMSE, increases. This is attributed to the fact that increasing the number of neighbors, $k$ , means that the algorithm attempts to find the precise location of the sensor node within a neighboring area that is large due to having many neighbors, $k$ . In such case, the local information available to the algorithm reduces since prediction is made based on a larger number of neighbors, causing a deterioration in the performance of the algorithm.

An interesting observation is that all RF features, both primary and hybrid, produce a steady performance when $k$ increases beyond the range $40\leq k\leq 50$ . Another interesting observation is that primary features produce the best performance when the indoor environment has a fewer number of multipath components, as the case for indoor open space of the Sports Hall in Fig. 6, where the best performance is obtained using only FCF. On the other hand, and as evident from Figs. 3 - 5 which correspond to indoor environments having a large number of multipath components, the best performance is obtained using hybrid features CTF + FCF. Such interesting finding reveals the fact that using the proposed hybrid RF features concept becomes more necessary in a cluttered environment having a large number of multipath components, a scenario that is encountered in many indoor environments that involve narrow corridors or closer boundaries.

The reduction in the localization distance error, RMSE, using the proposed method is verified by comparing it to other methods in the literature. It is evident from Table II that the the proposed method yields the lowest localization distance error, RMSE, compared to the other existing methods in the literature such as those by Bahl et al. [10], Li et al. [38], Alsindi et al. [28],[18].

VI Conclusion

This paper developed a cascaded two-stage machine learning approach for highly-accurate and robust indoor localization for IoT using the concept of hybrid RF features. In the first stage, $k$ -NN algorithm was used to identify the type of indoor environment. In the second stage, $k$ -NN algorithm was employed to identify the most appropriate selection and combination of RF features that yield the highest localization accuracy, based on the identified indoor environment. Investigations were carried out using a real dataset generated from practical measurements of the RF signal in realistic indoor environments. Results show that the prediction of the algorithm based on the concatenation of primary RF features enhanced significantly as the localization accuracy improved by at least 50% to more than 70%. The significant improvement in localization accuracy attained using the proposed two-stage machine learning method is attributed to the cascaded performance gains due to the identification of the type of environment and the concatenation of primary RF features.

Bibliography38

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Y. Ji, J. Hejselbæk, W. Fan, and G. F. Pedersen, “A Map-Free Indoor Localization Method Using Ultrawideband Large-Scale Array Systems,” IEEE Antennas and Wireless Propagation Letters , vol. 17, no. 9, pp. 1682–1686, Sep. 2018.
2[2] L. Benmessaoud, T. Vuong, M. C. E. Yagoub, and R. Touhami, “A Novel 3-D Tag With Improved Read Range for UHF RFID Localization Applications,” IEEE Antennas and Wireless Propagation Letters , vol. 16, pp. 161–164, 2017.
3[3] M. Hasani, J. Talvitie, L. Sydänheimo, E. Lohan, and L. Ukkonen, “Hybrid WLAN-RFID Indoor Localization Solution Utilizing Textile Tag,” IEEE Antennas and Wireless Propagation Letters , vol. 14, pp. 1358–1361, 2015.
4[4] C. Loyez, M. Bocquet, C. Lethien, and N. Rolland, “A Distributed Antenna System for Indoor Accurate Wi Fi Localization,” IEEE Antennas and Wireless Propagation Letters , vol. 14, pp. 1184–1187, 2015.
5[5] B. Hanssens, D. Plets, E. Tanghe, C. Oestges, D. P. Gaillot, M. Liénard, T. Li, H. Steendam, L. Martens, and W. Joseph, “An Indoor Variance-Based Localization Technique Utilizing the UWB Estimation of Geometrical Propagation Parameters,” IEEE Transactions on Antennas and Propagation , vol. 66, no. 5, pp. 2522–2533, May 2018.
6[6] A. Goian, M. Al Hajri, R. M. Shubair, L. Weruaga, A. R. Kulaib, R. Al Memari, and M. Darweesh, “Fast detection of coherent signals using pre-conditioned root-music based on toeplitz matrix reconstruction,” in 2015 IEEE 11th International Conference on Wireless and Mobile Computing, Networking and Communications (Wi Mob) . IEEE, 2015, pp. 168–174.
7[7] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and M. Ayyash, “Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications,” IEEE Communications Surveys Tutorials , vol. 17, no. 4, pp. 2347–2376, 2015.
8[8] M. Al Hajri, A. Goian, M. Darweesh, R. Al Memari, R. Shubair, L. Weruaga, and A. Al Tunaiji, “Accurate and robust localization techniques for wireless sensor networks,” June 2018, ar Xiv:1806.05765 [eess.SP].