Personal Dynamic Cost-Aware Sensing for Latent Context Detection

Saar Tal; Bracha Shapira; Lior Rokach

arXiv:1903.05376·cs.LG·March 14, 2019

Personal Dynamic Cost-Aware Sensing for Latent Context Detection

Saar Tal, Bracha Shapira, Lior Rokach

PDF

Open Access

TL;DR

This paper introduces a dynamic, cost-aware sensing approach for mobile devices that balances energy consumption and context accuracy using machine learning and optimization techniques.

Contribution

It presents a novel method that adaptively determines sensor sampling policies based on context, predicted information loss, and sampling costs, improving over static approaches.

Findings

01

Outperforms static sensing methods in energy efficiency and accuracy.

02

Balances information loss and energy consumption effectively.

03

Demonstrates superiority over state-of-the-art dynamic sensing methods.

Abstract

In the past decade, the usage of mobile devices has gone far beyond simple activities like calling and texting. Today, smartphones contain multiple embedded sensors and are able to collect useful sensing data about the user and infer the user's context. The more frequent the sensing, the more accurate the context. However, continuous sensing results in huge energy consumption, decreasing the battery's lifetime. We propose a novel approach for cost-aware sensing when performing continuous latent context detection. The suggested method dynamically determines user's sensors sampling policy based on three factors: (1) User's last known context; (2) Predicted information loss using KL-Divergence; and (3) Sensors' sampling costs. The objective function aims at minimizing both sampling cost and information loss. The method is based on various machine learning techniques including autoencoder…

Tables1

Table 1. Table 1. Mean rank of policy determination timing methods

Method	MIN	AVG	MAX	NEVER
Mean Rank	1.685	2.360	2.915	3.040

Equations8

\int_{- \infty}^{\infty} p (x) l o g_{2} \frac{p ( x )}{q ( x )} d x

\int_{- \infty}^{\infty} p (x) l o g_{2} \frac{p ( x )}{q ( x )} d x

in f o L oss (C, D) = i = 1 \sum n b_{c}_{i} C_{i} + j = 1 \sum m b_{d}_{j} D_{j} + i = 1 \sum n b_{sc}_{i} C_{i}^{2} + j = 1 \sum m b_{s d}_{j} D_{j}^{2} + i = 1 \sum n j = 1 \sum m b_{c_{i} d_{j}} C_{i} D_{j}

in f o L oss (C, D) = i = 1 \sum n b_{c}_{i} C_{i} + j = 1 \sum m b_{d}_{j} D_{j} + i = 1 \sum n b_{sc}_{i} C_{i}^{2} + j = 1 \sum m b_{s d}_{j} D_{j}^{2} + i = 1 \sum n j = 1 \sum m b_{c_{i} d_{j}} C_{i} D_{j}

cos t (D) = i = 1 \sum n \frac{cos t _{i}}{D _{i}}

cos t (D) = i = 1 \sum n \frac{cos t _{i}}{D _{i}}

min (i = 1 \sum n \frac{cos t _{i}}{D _{i}} + α \times in f o L oss (C, D))

min (i = 1 \sum n \frac{cos t _{i}}{D _{i}} + α \times in f o L oss (C, D))

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsContext-Aware Activity Recognition Systems · Human Mobility and Location-Based Analysis · Personal Information Management and User Behavior

MethodsSolana Customer Service Number +1-833-534-1729 · Linear Regression

Full text

Personal Dynamic Cost-Aware Sensing for Latent Context Detection

Saar Tal

1234-5678-9012

Ben-Gurion University of the NegevBeer ShevaIsrael43017-6221

[email protected]

,

Bracha Shapira

1234-5678-9012

Ben-Gurion University of the NegevBeer ShevaIsrael43017-6221

[email protected]

and

Lior Rokach

1234-5678-9012

Ben-Gurion University of the NegevBeer ShevaIsrael43017-6221

[email protected]

Abstract.

In the past decade, the usage of mobile devices has gone far beyond simple activities like calling and texting. Today, smartphones contain multiple embedded sensors and are able to collect useful sensing data about the user and infer the user’s context. The more frequent the sensing, the more accurate the context. However, continuous sensing results in huge energy consumption, decreasing the battery’s lifetime. We propose a novel approach for cost-aware sensing when performing continuous latent context detection. The suggested method dynamically determines user’s sensors sampling policy based on three factors: (1) User’s last known context; (2) Predicted information loss using KL-Divergence; and (3) Sensors’ sampling costs. The objective function aims at minimizing both sampling cost and information loss. The method is based on various machine learning techniques including autoencoder neural networks for latent context detection, linear regression for information loss prediction, and convex optimization for determining the optimal sampling policy. To evaluate the suggested method, we performed a series of tests on real world data recorded at a high frequency rate; the data was collected from six mobile phone sensors of twenty users over the course of a week. Results show that by applying a dynamic sampling policy, our method naturally balances information loss and energy consumption and outperforms the static approach.

Context-aware recommendation algorithms, Mobile interface, Cost-Aware Sensing

††ccs: Information systems Recommender systems††ccs: Computing methodologies Neural networks††ccs: Computing methodologies Supervised learning by classification

1. Introduction

In the past decade, the area of context and context-aware computing has become the focus of much research (Chen00, ; Perera14, ; BALDAUF07, ). Currently, the use of mobile devices has gone far beyond simple activities like calling and texting. Today, smartphones contain multiple embedded sensors such as GPS, accelerometer, microphone, etc. that enable collection of data about the user and user’s context inference (sarker16, ). Hence, context-aware systems can use this context to adapt their operations to the user without user intervention (BALDAUF07, ). Context-aware applications have been proposed in several domains including recommendation (Setten04, ; Linas11, ; Alhamid16, ), health-care (Bardram04, ; Bardram03, ; Mitchell00, ; Kang06, ; Solanas14, ; Munuz03, ; Kjeldskov04, ), smart homes (kabir15, ; Skubic09, ), data security (Chakraborty13, ; Mirsky17, ; Muhtadi03, ), etc.

In the literature we can find many different definitions and perceptions regarding the term context. A user’s context can be defined by his/her location, time of day, season, temperature, activity, environment, and even his/her emotions and mental state (ABOWD99, ).

Context inference can be divided into two categories: explicit or latent inference. Explicit context describes known user situations from a predefined set of contexts (e.g., ”at work,” ”running”) and hence can be better explained. However, it is challenging and a resource demanding task to define and train a large enough set of explicit contexts to cover the potentially large variety of user behaviors. Latent contexts are comprised of an unlimited number of hidden context patterns which are modeled as numeric vectors. They can be obtained automatically by applying unsupervised learning techniques on available raw data (e.g., mobile sensors)(unger16, ).

Context detection in mobile devices is done by analyzing data collected from the device’s sensors. The more frequent the sampling (also referred to as sensing), the more accurate the context (yuror13, ). However, continuous sensing results in huge energy consumption, decreasing the battery’s lifetime. Therefore, one of the main challenges in mobile context-aware applications is cost-aware accurate context detection where cost refers to the power consumption of the device (sarker16, ). When facing this challenge, a trade-off between the sensing accuracy and energy efficiency is required.

We propose a novel approach for cost-aware sensing while performing continuous latent context detection. The suggested approach dynamically determines the user’s sensors sampling policy (i.e. number of time intervals between samples for each sensor) based on three main factors: (1) the user’s last known context - a latent context vector which is a reduced dimensional representation of user’s features; (2) a supervised machine learning model which predicts information loss based on KL-Divergence between the actual context and the estimated context of candidate policies; and (3) sensors’ sampling costs in terms of energy consumption. The main objective of the approach is to dynamically manage the trade-off between energy consumption and information loss, while minimizing both. The method is based on various machine learning techniques including autoencoder neural networks for latent context detection, linear regression for information loss prediction, and convex optimization for determining the optimal sampling policy. The objective function is nonlinear and takes into consideration both weighted predicted information loss and sampling costs. The models we use are user-personalized, trained and created for each user separately. In their paper, Lockhart et al.(lockhart14, ) compared different learning algorithms for activity recognition on impersonal, personal and hybrid models. Their results show that the personal models outperforms the impersonal models.

While multiple studies have been conducted that suggested various approaches to deal with the energy-accuracy trade-off (paek10, ; yuror11, ; yuror13, ; sarker16, ; zhang13, ; ben09, ; rachuri10a, ; rachuri10b, ; rachuri11, ; Nath12, ; kang08, ; Wang09, ), none of them provide a solution that is designated for latent contexts. Furthermore, none of those methods exploit latent context and information loss when applying dynamic sampling for an unlimited number of sensors. Moreover, we are the first to present KL-Divergence as a measure for information loss in the task of context detection.

The main contributions of this paper are as follows:

•

We suggest a novel energy efficient framework that utilizes dynamic sensing while taking both the user’s last known latent context and predicted information loss into account.

•

The method we propose is generic and applicable for an unlimited number or type of sensors.

•

The proposed method is designated for latent context and is not limited to a finite set of contexts.

•

Since context is latent and comes in the shape of a numeric vector, we suggest KL-Divergence as a new measure for information loss in the task of latent context detection.

2. Related Work

Many studies have been conducted in the area of energy efficient context sensing. SenseLess (ben09, ), SensTrack (zhang13, ) and RAPS (paek10, ) are location detection applications that reduce the use of expensive sensors by using less expensive sensors more frequently. SenseLess (ben09, ) uses the accelerometer sensor to trigger the more expensive GPS sampling when motion is detected. SensTrack (zhang13, ) selectively executes GPS sampling based on acceleration and orientation sensors’ data. In addition, when the user moves indoors and GPS is unavailable, it switches to Wi-Fi sensing method. RAPS (paek10, ) uses the accelerometer to avoid GPS sampling when the user is stationary. Moreover, it uses the location-time history of the user to estimate the user’s velocity and adaptively turns on the GPS only if the estimated uncertainty about the position exceeds the accuracy threshold. It also avoids turning on the GPS when it is not available and utilizes Bluetooth communication to reduce position uncertainty among neighboring devices. These methods provide a fine solution for the energy-accuracy trade-off, however their approach is limited to location detection and focuses on specific motion and location sensors without using machine learning techniques. In contrast, our approach is applicable for any sensors or latent context and automatically determines the sampling policy using machine learning techniques.

Yurur et al. (yuror11, ; yuror13, ) and Sarker et al. (sarker16, ) adjust sampling frequency and duty cycle by measuring the stability of the sensors’ values. Sampling frequency refers to the number of samples within a cycle, and duty cycle refers to the portion of time of an operational cycle spends on sampling. Duty cycle and sampling frequencies are chosen from a Cartesian product of two sets. While in their approach the sensing options are discrete, our method utilizes a continuous optimization function. Furthermore, while their technique cannot be adapted to sensors that don’t support setting different sampling periods (GPS, Wi-Fi, etc.), our method is applicable for all sensors. Moreover, while their methods only take into account the sensor stability and don’t consider the context, our method utilizes the context itself, as well as the predicted information loss which is more relevant to context-aware applications.

Another approach for handling the accuracy-energy trade-off is presented by Rachuri et al. (rachuri10a, ; rachuri10b, ). They use adaptive sensor sampling which relies on the dynamic selection of predefined back-off/advance functions based on event history. The sampling interval decreases/increases when an interesting/non-interesting event is observed. Furthermore, depending on event stability, the method switches from the least to most ”aggressive” function. While this approach sets the sampling interval at any interval the predefined functions supply, our approach determines sampling intervals which are the products of a minimal applicable sampling interval. Moreover, while the authors use the number of consecutive samples of the same state (interesting or not) to switch between functions in a conditional form, our approach considers the context itself and uses machine learning to learn the predicted information loss when using different policies.

In SociableSense (rachuri11, ), Rachuri suggests a different approach that adjusts the sensor’s duty cycle according to its sensing probability. The sensing probability is dynamically calculated and defined as the portion of successes of previous sensing actions. A success indicates that the sensing action resulted in capturing an interesting event (domain dependent). The sensors are sampled at a high rate when there are interesting events observed and at a low rate when there are no events of interest. While their approach utilizes dynamic duty cycle for each sensor separately, our approach takes into consideration the combination of all sensors together when determining the sampling policy.

3. Method

We propose a novel approach for cost-aware sensing when performing continuous latent context detection. The proposed method is based on dynamic sampling policies. The sampling policy, which specifies when to resample each sensor, is determined dynamically according to the last known latent context, and the objective function aims at minimizing both sampling cost and information loss. Information loss is measured as the KL-Divergence between the actual user’s context and his/her last known context, which are detected based on the actual and last known sensors’ values respectively. The actual ground truth was created by frequently recording all sensors. We learned the difference in information loss between contexts that are inferred from the ground truth sensing and simulated scenarios. To create a simulated dataset that reflects the no-sensing scenario for different time periods, we added synthetic records with data from previously sensed records to the user’s records. Thus, we were able to investigate the trade-off between energy efficient sensing and accurate context detection.

When new data is received, we predict the information loss based on the learned differences and last known context.

Our method consists of the following four steps (that will be explained in greater detail in the following sections):

(1)

Data Extension: Multiple synthetic records are created from each existing record, such that sensors’ values are taken from older records. Each record (synthetic or actual) and its corresponding distances between the actual record to the older records are maintained in the dataset. 2. (2)

Latent Context Detection: An autoencoder is trained to detect latent context on the raw sensor data by reducing the dimensionality of the sensor data and representing the latent context as a compact vector. When training is complete, latent context is detected for each record (synthetic or actual) in the extended dataset. 3. (3)

Information Loss Prediction: Information loss is calculated between each pair consisting of the synthetic record’s context and the corresponding actual record’s context. Afterwards, based on actual context, distances between samples, and corresponding information loss, a linear regression model for information loss prediction is trained. 4. (4)

Sampling Policy Determination: When new data is sensed, the algorithm computes the current (last known) latent context and choses the best sampling policy given that context. The best policy is the one that minimizes the objective function which considers both sampling cost and predicted information loss.

The suggested method is based on several hypotheses. The first is that different users have different behaviors which are reflected in sensors’ data values and their derived contexts. For example, some users tend to hold their phone in their hands while others carry it in their pockets; some people perform activities such as eating while holding or touching their phone; and others seldom touch their phone (hoober13, ). Thus, building a personal model for context detection will result in a more accurate context than using a single model for all users. The second hypothesis is that the determination of the sampling policy should be done dynamically and take the user’s most recent known context into consideration rather than an outdated or predefined set of known contexts. We believe that different contexts require different sampling policies. Different contexts may vary in their duration and in the list of sensors that are needed for their detection. For example, when entering an office, there is no point in sensing the GPS frequently, or when the user sleeps, the accelerometer data may be useless. Since our method can handle latent contexts that are not predefined or outdated, the dynamic nature of the detection is crucial. Finally, we hypothesize that Kullback-Leibler Divergence (KL-Divergence) is an applicable loss function for calculating the difference between the actual and the last known context vectors. In contrast to other difference metrics, KL-Divergence measures the difference between two non-symmetric probabilities vectors. While the first vector represents the ”true” distribution, the second represents an approximation of that distribution. Therefore, when creating the first context vector based on actual sensors’ data and the second context vector based on the last known sensors’ data, the KL-Divergence between them reflects the information loss when reducing sensing cost.

3.1. Data Extension

In this step, we wish to simulate a situation of non-sampling sensors for a time period by extending each record to multiple synthetic records. A synthetic record contains previous sensors’ values instead of actual values, as if they weren’t sensed in that time interval.

In the extension algorithm, Algorithm 1, we iterate through all of the user’s records (line 2). From each record we create multiple synthetic records by completing sensors’ values from older records. The choice of which older record to use is derived from a specific distance $"dist"$ which defines the record’s distance from the older record and varies from 1 to a predetermined configurable maximum limit $maxDist$ . The distance is determined for each sensor separately and is maintained in a dedicated distances vector $\overrightarrow{D}$ .

First, we add the actual record to the extended records list (line 6), thus the distance vector $\overrightarrow{D}$ will be all zeros (lines 4-5). Second, we create synthetic records either systematically or randomly. When extending data systematically, we iterate on all possible distances in a range (line 7) and collect all sensors’ values from a record which is located $"dist"$ -records away (lines 12-14). In that case, the distance vector $\overrightarrow{D}$ will be all $"dist"$ (lines 10-11). When extending data randomly, for each sensor (line 17) we choose a random distance from a range, add it to the distances vector $\overrightarrow{D}$ (lines 18-19), and collect the sensors’ values from the corresponding older record (lines 20-22). In other words, the values of features for different sensors are taken from different records. The process of creating a random record is repeated $k$ times (line 15).

3.2. Latent Context Detection

In the Latent Context Detection step, we build a personal model for latent context detection for each user.

Latent contexts are hidden context patterns modeled as numeric vectors. They can be obtained automatically by applying unsupervised learning techniques on available raw data (unger16, ). Our work is the first to create an energy efficient framework for latent context detection, and therefore it is not limited to a predefined set of contexts.

We chose to use Unger’s (unger16, ) method for latent context detection using an autoencoder. The autoencoder is a neural network model aiming at reconstructing the input after reducing the input’s dimension by adapting the autoencoder’s weights. When defining the input as sensors’ values (as can be seen in Figure 1), we consider the most hidden layer as the user’s latent context. In other words, the latent context vector is a reduced dimensional representation of the features’ vector. Since our goal is to predict the hidden layer, the output layer is only used for training the model and can be discarded afterwards.

Algorithm 2 describes the process of latent context detection. After creating a personal context detection model (line 1), we iterate all of the user’s records (line 3) and detect latent context for each record (line 4). The context $\overrightarrow{C}$ is saved, along with the distances $\overrightarrow{D}$ from the previous step (line 6). The sensors’ values are no longer required; only the context and the distances between samples are needed.

3.3. Information Loss Prediction

During the Information Loss Prediction step, our goal is learning the information loss when sampling sensors in different frequencies (i.e., distances between samples). We consider the information loss as the distance between the synthetic record’s context and its actual record’s context. In other words, the change in sensors’ values results in different context, and the distance between the two contexts is the information loss.

We chose to calculate this distance using KL-Divergence, which is a measure of the difference between two non-symmetric probability distributions $P$ and $Q$ . $P$ typically represents the ”true” distribution of data, while $Q$ typically represents an approximation of $P$ (Burnham03, ). Therefore, in our algorithm, $P$ is the actual record’s context (the ”true” context), while $Q$ is the artificial record’s context. The formula for calculating KL-Divergence is:

[TABLE]

Algorithm 3, which is used to calculate and predict information loss, is described next. Given the output from Algorithm 2, for each record (synthetic or not), we extract the context and distances (lines 3-4) and retrieve the actual record’s context (line 5). The actual record can be found by a simple calculation based on the current record’s index (since we know how many synthetic records have been created for each record). Afterwards, using Eq. 1, we calculate the information loss (line 6) and create a new record that contains the actual context, the corresponding distances, and the derived information loss (line 7).

After calculating the KL-Divergence between each pair of the synthetic record’s context and its actual record’s context, we build a personal model for predicting information loss given the last known context and a set of distances (one distance for each sensor). We chose to use a linear regression model with KL-Divergence as its dependent variable. The independent variables are the last known context $\overrightarrow{C}$ , the distances from last sample $\overrightarrow{D}$ , their squared values, and the interaction variables between them, so that the decision will be dependent on the context. These variables form the features of the regression model.

Since the context is set when determining a sampling policy, without interaction variables it is treated as a constant and is ignored. Therefore, in line 9 we transform $contextDistInfoLoss$ so it also contains the squared values and interaction variables. The information loss is predicted according to the following regression formula:

[TABLE]

where $1\leqslant i\leqslant|C|=n$ , $1\leqslant j\leqslant|D|=m$ , $b_{c}$ and $b_{d}$ are context and distance coefficients, $b_{sc}$ and $b_{sd}$ are squared value coefficients, and $b_{cd}$ are interaction variable coefficients.

After creating the features, we train a Lasso model (line 10) and return it (line 11). Our first choice for the information loss prediction model was the state of the art $XGBoost$ model for regression. When comparing it to a simple linear regression model, we found that although the $XGBoost$ error is lower, the difference in the MSE between the models is not statistically significant. However, we preferred to use the pure regression model, since it is a much simpler model. Furthermore, in order to force positive coefficients, we used a Lasso model which reduced the model’s complexity by setting some of the coefficients to zero.

3.4. Policy Determination

For this step, which is the final step in the process, we use the results of all of the prior steps. When new data is sensed, we wish to determine when to resample each sensor. A decision about the sampling policy is made based on the sensors’ sampling cost, predicted information loss, and the last known context.

Therefore, when new data is sensed, the following process is employed. First, we detect the current context using the personal model we built in the Latent Context Detection step. Then, given that context, we choose the best policy (i.e., best distance) for each sensor. The best policy is the one that minimizes the sampling cost while incurring minimal information loss. Thus, the determining of sampling policy is based on the following functions:

**: **

$\bullet$ Cost function: This function considers the sampling cost of each sensor and the sampling frequency (distance from last sample). It puts the cost in direct proportion to the sensors’ costs and in inverse proportion to the distances between samples. Where $n$ is number of sensors and $\overrightarrow{D}$ is distances vector, the cost is computed using the following formula:

[TABLE]

**: **

$\bullet$ Information loss function: A Lasso regression model, as seen in Eq. 2, with KL-Divergence as its dependent variable. Independent variables are derived from the last known context $\overrightarrow{C}$ and distances between samples $\overrightarrow{D}$ .

The objective function will include both sampling cost and information loss:

[TABLE]

where $n$ is the number of sensors, $|\overrightarrow{D}|=n$ , and $\alpha$ is a tuning parameter for the weight of the information loss.

After determining the sampling policy, we set a countdown timer which repeatedly triggers the policy determination. It raises the questions of when should the sampling policy be determined again or what value should we initialize the timer with. We considered the following options: (1) MAX: The maximal distance from the last policy; (2) MIN: The minimal distance from the last policy; (3) AVG: The average distance from the last policy; (4) NEVER: determine the policy once and never determine it again. For example: If we have three sensors and the last policy was 1,5,6, then according to MAX, the next policy will be determined after six time intervals, according to AVG after four intervals, and according to MIN after one interval. However, NEVER will initialize the timer with infinity. In the next section, we evaluate the difference between those methods with respect to information loss and cost trade-offs. Switching between them can be handled by changing a configurable parameter.

The complete algorithm is described in Algorithm 4. It gets as input a list of sensors, policy determination mode and user’s personal models for information loss prediction and context detection. First, we initialize $timeToSample$ and $policyTimer$ with zeros (lines 1-4). The first indicates how many time intervals are left to re-sample each sensor, and the second indicates how much time intervals are left to switch policy. We initialize them with zeros in order to sample all sensors and determine a sampling policy the minute the process starts.

After initialization, we perform an infinite loop and repeat the following process: First, for each sensor we check if next sampling countdown has reached zero (line 8). In case it has, we sense it and save its new values in $lastSampleValues$ dictionary (line 9). Otherwise, we do nothing and last sensed values are kept. Either way, we append sensor’s values (new or old) to $record$ , which represents current user’s record (line 10). Second, we detect current context using user’s context model (line 11). Third, if time to new policy has reached zero (line 12), we save current policy as $prevPolicy$ (line 13), determine a new policy using Algorithm 5 (line 14), recalculate $timeToSample$ for all sensors using Algorithm 6 (line 15) and update $policyTimer$ according to the given mode (line 16). Otherwise, we simply reduce by one $timeToSample$ of all sensors (line 18-19), and $timeToPolicy$ as well (line 22). If $timeToSample$ of one of the sensors is negative, we initialize it with the value from current policy (lines 20-21).

Algorithm 5 describes the use of $cvxsolver$ (from Python’s $cvxopt$ package) in order to minimize objective function. In lines 1-2 we create $G$ and $h$ , which together represent a constraints system, such that $Gx\leqslant h$ . $G$ is a sparse matrix of size $2n\times n$ , where value $(i,j)$ represents the coefficient of variable $j$ in equation $i$ . $h$ is a dense matrix of size $2n\times 1$ , where value $(i,1)$ represents the right-hand side constant for equation $i$ . Since we would like to achieve $1\leqslant x\leqslant maxDist$ for every distance $x$ , we initialize $G$ and $h$ in the following way: First $n$ rows of $G$ ’s diagonal and first $n$ rows of $h$ are initialized with -1 and 1 respectively, while last $n$ rows of $G$ ’s diagonal and last $n$ rows of $h$ are initialized with 1 and $maxDist$ respectively. After defining the constraints, we define the target function $F$ with Eq. 4 (line 3) and call the optimization solver with $F$ , $G$ and $h$ (line 4). At last, best policy is returned (line 5).

Algorithm 6 describes the way we recalculate $timeLeft$ - $ToSample$ for each sensor according to the new policy. If previous policy is $null$ , it means that this is the first time we determine a sampling policy and therefore $timeToSample$ will be equal to current policy (line 11). Otherwise, we need to take into consideration the previous policy. If previous policy is equal to new policy, we do nothing and $timeToSample$ stays the same. If not (line 2), we do the following for each sensor: If $timeToSample$ of a sensor is zero, we simply initialize it with its new policy value (lines 4-5). Otherwise, we first calculate the number of time intervals that have passed since last sample , $timeSinceSampled$ , by subtracting $timeToSample$ of the sensor from its previous policy value (line 7). Second, we calculate the remaining time according to the new policy by subtracting $timeSinceSampled$ from its new policy value (line 8). In case of negative result, $timeLeftToSample$ is set to zero (line 9).

4. Evaluation

In this section, we describe the evaluation of our method. The objective of the evaluation was to show that it is beneficial to dynamically determine the sampling policy in terms of accuracy and energy consumption. In addition, we wanted to show that KL-Divergence is an applicable information loss measure for latent context detection.

We performed a series of offline simulations on the Sherlock dataset (Mirski16, ). We used data collected from twenty users from six sensors of their mobile devices, namely: GPS, cell tower, accelerometer, gyroscope, magnetic field, and status (including various status features of the phone, such as volume, screen orientation, etc.). All sensors were sampled once a minute. The data was collected for about a week and contains about 10,000 records for each user.

We use KL-Divergence as a measure for information loss between the actual and last known contexts. In order to check its applicability as a measure for information loss, we trained twenty personal models for information loss prediction, one for each user, and checked its correlation with the distances between samples across different contexts.

After an empirical evaluation, we set $maxDist=32$ and $K=20$ when extending the user’s data (the first step of our method), meaning that from a single record we simulated 52 synthetic records. 32 were generated using the systematic method, and 20 were generated randomly. We then calculated the KL-Divergence between their derived contexts. For all users, the regression model for the information loss prediction succeeded in converging to a solution, when considering the positive coefficient constraint and taking some of the context features into account (i.e., some context features resulted in non-zero coefficients). Then, for each user we predicted the information loss when setting the same distance for all sensors from 1 to $maxDist$ . Figure 2 presents the information loss as a function of the distance between samples for a single user, and each linear line stands for a different context. It shows that the greater the distance between samples, the greater the predicted information loss. Moreover, it can been seen that for the same user, the slope of the graph varies for different contexts. This indicates that the last known context affects the predicted information loss and thus may affect the sampling policy.

4.1. Energy-Information Loss Trade-off

During the policy determination step, we simulated the process of continuous dynamic sensing that includes iterations of the following: a) Sample sensors according to the determined policy; b) Detect context; c) When required - determine new policy.

The selected policy in each iteration is the one that minimizes the sum of cost (energy) and information loss given the last known context (See Eq. 4). The process is repeated continuously, calculating the total information loss and the total cost for all user’s records.

In the continuous process of sensing and determining sampling policy, the question of when to redetermine policy was raised. We simulated four different methods: (1) MAX: The maximal distance from last policy; (2) MIN: The minimal distance from last policy; (3) AVG: The average distance from last policy; (4) NEVER: determine policy once and never determine it again. The NEVER method represents the static method. Each method was tested with five different weights for information loss (See Eq. 4): 0.1, 1, 5, 10 and 20. Hence, in total, we performed twenty simulations for each user.

The process of ranking the methods goes as follows: Each simulation is represented as a two-dimensional point $(x,y)$ , where x and y represent the total information loss and total cost respectively. Then, for each $\alpha$ , we find a subset of points that constitutes the Pareto Frontier (for minimum cost and minimum information loss), subtract it from the superset, and repeat the process until the superset is empty. With each iteration, the rank is increased by one. After rank calculation, we computed the statistic $F_{F}=29.492$ , which has turned out to be greater than the critical value (2.696). Therefore, the null hypothesis was rejected. Table 1 presents the average rank for each method. Results show that the dynamic methods perform better than the static method (NEVER), and the more frequent the decision, the better the rank. This implies that frequent policy determination improves results.

After rejecting the null hypothesis, we performed the Nemenyi test between each pair of methods. The conclusion is that there’s a significant difference between all methods except for NEVER and MAX. We assume this is due to the fact that policy doesn’t change for a relatively long time window, whereas many changes in context may occur. In addition, as we expected, when examining optimized sampling policies for all sensors, we saw that the greater the $\alpha$ (information loss weight), the smaller the distances between samples. The only exception is the status sensor, which constantly gets policy of one (sample every time interval). This makes sense, since it’s the only sensor which has zero cost. However, the GPS sensor, which is the most expensive sensor, gets $maxDist$ policy even with $\alpha$ greater than 0.1. Figure 3 presents the average policy for each sensor as a function of $\alpha$ for two users. Each line refers to a different sensor. It can been seen that for all sensors, except for the status sensor, the average distance between samples decreases when the weight of information loss increases. This is clear evidence that our method successfully performs an energy-information loss trade-off that can be controlled by applying different weights.

4.2. Comparison with State-of-the-Art

We chose to compare our method with Rachuri’s (rachuri11, ) approach which is the only method to the best of our knowledge that uses machine learning to perform adaptive sampling. Rachuri (rachuri11, ) suggests to adjust the sensor’s duty cycle according to its sensing probability. The sensing probability is dynamically calculated and defined as the portion of successes of previous sensing actions. A success indicates that the sensing action resulted in capturing an interesting event (using domain dependent pretrained classifier). The sensors are sampled at a high rate when interesting events are observed and at a low rate when there are no events of interest. The technique works as follows: Let $p_{i}$ be the probability of sensing from a sensor $s_{i}$ where $i\in{accelerometer,Bluetooth,microphone}$ , and $a_{i}$ is the sensing action on a sensor $s_{i}$ . If the sensing action $a_{i}$ results in an interesting event, the probability is increased: $p_{i}=p_{i}+\alpha(1-p_{i})$ , where $0<\alpha<1$ . Otherwise, the probability is decreased: $p_{i}=p_{i}-\alpha p_{i}$ . The lower and upper bounds of the probabilities were limited to 0.1 and 0.9, respectively. While his approach considers the values of each sensor separately, our approach provides a more complete view of a user’s context by considering the combination of multiple sensors.

In order to compare our method to Rachuri’s method, we implemented it with a few necessary adjustments: First, since our data is not labeled we weren’t able to train event classifiers. Therefore, in order to determine whether an interesting event has occurred, we calculated the KL-Divergence between every pair of consecutive sensor records and used the 90% quantile as a threshold. Second, since we use offline simulations we could only change the time between samples and weren’t able to change the duty cycle. Therefore, according to the adapted sensing probability, we calculated the time between samples such that lower probability results in longer time between samples.

We ran offline simulations using our implementation for each of the users and calculated the total cost and information loss; and compared the results with our values when using the MIN timing approach over different information loss weights. To determine the significance of the difference between the performance of methods we used a paired t-test with $\alpha=0.01$ as the confidence level. The results are presented in Figure 4 which provides a comparison of Rachuri’s normalized mean information loss and cost with our normalized mean values as a function of the weight. The statistically significant results are denoted by an asterisk (*). The results demonstrate that in terms of information loss, our method is significantly better when the weight is 1 or higher, and in terms of cost, it is significantly better when the weight is 7 or lower. Therefore, our method is better for both cost and information loss when the weight is in the range of 1 to 7.

5. Conclusions and Future Work

We presented a novel method for continuous cost-aware sensing for managing the trade-off between energy consumption and information loss while trying to minimize both. The suggested framework dynamically determines a user’s sampling policy based on three factors: (1) User’s current latent context; (2) Predicted information loss; (3) Sensors’ sampling costs. The latent context is calculated with an autoencoder, the information loss is predicted with Lasso regression, and the best sampling policy is determined using convex optimization. All models are personal, and objective function is a nonlinear function that gives weight to both sampling costs and predicted information- loss. We evaluated the suggested method by performing a series of offline simulations on data recorded from six mobile device sensors of twenty users.

The results show that the dynamic adaptation approach is better than the static approach in terms of accuracy and energy consumption, and that KL-Divergence is an applicable measure for information loss for the task of latent context detection.

The results show that our method successfully performs an energy-information loss trade-off that can be controlled by setting different weights in the objective function. This enables context-aware applications to be accurate while consuming less energy.

Moreover, when comparing our method to another state of the art dynamic method, it outperformed in both the sampling cost and information loss measures in some cases, while in other cases our method achieved better results in one of those measures.

In the future, we plan to implement and evaluate our method within a context-aware application and compare the performances of personal vs non-personal models. In addition, we plan to improve our method by adding feature selection on users’ sensors data. This is due to the assumption that some features may perform better than others for different users. Furthermore, we wish to use hybrid models which optimize the sampling policy based on both personal and non-personal models. The hybrid models solve the cold start problem when there is insufficient user data or when model’s error exceeds a predetermined threshold.

Bibliography42

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1) Abowd, Gregory D., et al. ”Towards a better understanding of context and context-awareness” , In: International Symposium on Handheld and Ubiquitous Computing. Springer Berlin Heidelberg, 1999. p. 304-307.
2(2) Al-Muhtadi, Jalal, et al. ”Cerberus: a context-aware security scheme for smart spaces.” Pervasive Computing and Communications, 2003.(Per Com 2003). Proceedings of the First IEEE International Conference on. IEEE, 2003.
3(3) Alhamid, Mohammed F., et al. ”Rec Am: a collaborative context-aware framework for multimedia recommendations in an ambient intelligence environment.” Multimedia Systems 22.5 (2016): 587-601.
4(4) Baldauf, Matthias; Dustdar, Schahram; Rosenberg, Florian. ”A survey on context-aware systems” , International Journal of Ad Hoc and Ubiquitous Computing, 2007, 2.4: 263-277.
5(5) Ben Abdesslem, Fehmi; Phillips, Andrew; Henderson, Tristan. ”Less is more: energy-efficient mobile sensing with senseles” , In: Proceedings of the 1st ACM workshop on Networking, systems, and applications for mobile handhelds. ACM, 2009. p. 61-62.
6(6) Baltrunas, Linas, et al. ”Incarmusic: Context-aware music recommendations in a car.” E-Commerce and web technologies (2011): 89-100.
7(7) Bardram, Jakob E. ”Applications of context-aware computing in hospital work: examples and design principles.” Proceedings of the 2004 ACM symposium on Applied computing. ACM, 2004.
8(8) Bardram, Jakob E. ”Hospitals of the future ubiquitous computing support for medical work in hospitals.” Proceedings of Ubi Health. Vol. 3. 2003.