A Temporal Clustering Algorithm for Achieving the trade-off between the User Experience and the Equipment Economy in the Context of IoT
Caio Ponte, Carlos Caminha, Rafael Bomfim, Ronaldo Moreira, Vasco, Furtado

TL;DR
This paper introduces the Temporal Clustering Algorithm (TCA), an incremental learning method for IoT devices that predicts usage patterns to optimize energy consumption and user comfort, achieving significant savings and high accuracy.
Contribution
The paper presents a novel low-memory, configurable clustering algorithm for anticipatory IoT computing that balances user experience and energy efficiency.
Findings
Energy savings up to 40% in water dispensers
Over 90% accuracy in usage time prediction
Low-cost implementation with less than 1Kbyte memory
Abstract
We present here the Temporal Clustering Algorithm (TCA), an incremental learning algorithm applicable to problems of anticipatory computing in the context of the Internet of Things. This algorithm was tested in a specific prediction scenario of consumption of an electric water dispenser typically used in tropical countries, in which the ambient temperature is around 30-degree Celsius. In this context, the user typically wants to drinking iced water therefore uses the cooler function of the dispenser. Real and synthetic water consumption data was used to test a forecasting capacity on how much energy can be saved by predicting the pattern of use of the equipment. In addition to using a small constant amount of memory, which allows the algorithm to be implemented at the lowest cost, while using microcontrollers with a small amount of memory (less than 1Kbyte) available on the market. The…
| Memory (Bytes) | Consumption (Wh) | Error | |||||
| Real | CS | RS | Real | CS | RS | ||
| TCA | 10.98 | 13.12 | 13.19 | 0.14 | 0.03 | 0.10 | |
| K-Means | 10.73 | 13.04 | 13.02 | 0.27 | 0.08 | 0.14 | |
| EM | 10.55 | 13.34 | 12.78 | 0.31 | 0.04 | 0.06 | |
| Conventional | - | 14.64 | 14.75 | 13.25 | 0.00 | 0.00 | 0.00 |
| Consumption (Wh) | Error | |||||
| Real | CS | RS | Real | CS | RS | |
| Eco | 8.36 | 11.39 | 11.41 | 0.34 | 0.16 | 0.29 |
| Balance | 10.03 | 12.07 | 13.16 | 0.13 | 0.06 | 0.10 |
| Comfort | 10.98 | 13.12 | 13.19 | 0.14 | 0.03 | 0.10 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A Temporal Clustering Algorithm for Achieving thetrade-off between the User Experience and the Equipment Economy in the Context of IoT
Caio Ponte
Programa de Pós Graduação em Informática Aplicada
Universidade de Fortaleza
Fortaleza, Brazil
&Carlos Caminha
Centro de Ciências Tecnológicas
Universidade de Fortaleza
Fortaleza, Brazil
&Rafael Bomfim
Programa de Pós Graduação em Informática Aplicada
Universidade de Fortaleza
Fortaleza, Brazil
&Ronaldo Moreira
Programa de Pós Graduação em Informática Aplicada
Universidade de Fortaleza
Fortaleza, Brazil
&Vasco Furtado
Programa de Pós Graduação em Informática Aplicada
Universidade de Fortaleza
Fortaleza, Brazil
Abstract
We present here the Temporal Clustering Algorithm (TCA), an incremental learning algorithm applicable to problems of anticipatory computing in the context of the Internet of Things. This algorithm was tested in a specific prediction scenario of consumption of an electric water dispenser typically used in tropical countries, in which the ambient temperature is around 30-degree Celsius. In this context, the user typically wants to drinking iced water therefore uses the cooler function of the dispenser. Real and synthetic water consumption data was used to test a forecasting capacity on how much energy can be saved by predicting the pattern of use of the equipment. In addition to using a small constant amount of memory, which allows the algorithm to be implemented at the lowest cost, while using microcontrollers with a small amount of memory (less than 1Kbyte) available on the market. The algorithm can also be configured according to user preference, prioritizing comfort, keeping the water at the desired temperature longer, or prioritizing energy savings. The main result is that the TCA achieved energy savings of up to 40% compared to the conventional mode of operation of the dispenser with an average success rate higher than 90% in its times of use.
K****eywords Anticipatory Computing Internet of Things Temporal Clustering K-means Expectation Maximization
1 Introduction
The emergence of the Internet of Things () has enabled what is called anticipatory computing. “Things”, while being active objects or even agents, are increasingly possessing autonomy, which gives them the capability of anticipating people’s wishes and intentions, as well as controlling their own behavior, for example, optimizing their energy expenditure. This context of finds users of technology that are increasingly demanding and that require the intelligence of machines to provide customized answers to their demands [1].
Although there is plenty of literature on applications that take into account user preference [2, 3], there are few alternatives that examine the issue from a conflicting multi-objective perspective. It is common to come across problems in which the optimization of the behavior of the “Thing” (here named object) can lead to an unsatisfactory user experience. Providing tools that work from this perspective of multi-objectives is fundamental, because they allow for a maximized trade-off between user experience and efficiency of the behavior of the object.
Combined with this challenge of handling conflicting objectives, there is the challenge to develop a solution that requires little memory (thus making its installation and microcontrollers used inexpensive) and low processing power. Specifically, the clustering of time series, in the context of , is a challenging problem because, at first, time series data is usually much larger than the size of the memory and consequently is stored on disks [4]. This leads to an exponential decrease in the speed of the clustering process. Although the evolution of microcomputers and servers is constant, where they are capable of increasing processing and storage, the nature of the demonstrates the need for a computational solution that is capable of performing in microcontrollers, and/or in small computers, which often have severe processing and memory restrictions [7]. In addition, by investing in a low-cost technology in household equipment, for example, it is possible to add the market value in return for a small increase in manufacturing cost, allowing for intelligent behavior to be added that will enhance the user experience [8].
This research aimed to evolve the behavior of a piece of domestic equipment, specifically an electric water cooler and dispenser. This type of equipment is often used in tropical countries, where the ambient temperature often exceeds 30°C (86°F). This type of dispenser, when in operation, activates its electric compressor and cools the water inside a thermal tank, keeping it in a temperature range, usually between 9 and 12°C. The compressor maintains the temperature even at times when there is no water consumption, which can lead to wasted energy, as there is energy expended to keep the water cold during periods where it probably will not be consumed. Summarizing, although cooling is essential for the water to be consumed at a pleasant temperature, the intense use of the compressor of the dispenser generates greater power consumption. This compromise between the experience of drinking cold water and the increase in the energy bill is the focus of the discussion in this article. At one extreme (as it is commonly known, since it is the standard on the market) one has to make the compressor stay on continuously and so, every time the user goes to the water to drink he/she will always receive properly cooled water. At the other extreme, if the compressor only turns on a few times a day, energy savings will occur, but the user’s satisfaction in drinking chilled water will probably not always occur.
In order to meet the above requirements, the main contribution of this work is the development of an algorithm of incremental learning that seeks to discover the pattern of use of the appliance so that it is possible to control the electric water dispenser while maintaining the quality of the user experience (here represented by the satisfaction of drinking chilled water). We propose herein the Temporal Clustering Algorithm (), an algorithm that factors multiple time series in a set of non-overlapping segments, known as time clusters. The agglomerative behavior of the is inspired by the City Clustering Algorithm () [9, 10] a spatial agglomeration algorithm widely used in defining cities beyond their boundaries [11, 12, 13, 14]. The main feature of the is to consider user preference through its three modes of Comfort, Balance, and Eco. Comfort mode means that the algorithm will give a greater weight to user satisfaction compared to energy savings. In Eco mode the opposite will occur, while Balance mode will seek to balance the importance of these criteria.
From tests with real and synthetic data of consumption of an electric water dispenser, it was verified that the identification of these agglomerates is useful to predict consumption schedules, allowing for the compressor to be used in an intelligent way, favoring, according to the user’s preference, the economy of energy or ensuring that chilled water is consumed even at times when the usage of the appliance is infrequent. The energy saving using the was compared with the conventional mode of the water cooler and dispenser. has shown that it achieves energy savings of up to 40% compared to this mode with an average accuracy of more than 90% of dispenser usage times. Its success rate and memory consumption was also compared with some of the most commonly used clustering algorithms in the literature and it was shown that the exceeds, for the specific application scenario discussed in this article, these algorithms in many situations.
2 State of the Art
Recent works also explore the combination of hardware and software for monitoring and control in urban scenarios ideal for the use of the paradigm, such as the one addressed here [15, 16]. Orsi et al [17] presented a system that integrated a hardware controller for energy efficiency, a communication protocol to improve data transmission, and a software module for planning and managing household devices, which operates according to user preferences and maximum power consumption. In Orsi’s article, integration with machine learning for pattern detection is a future work, which proposes to do.
Madiraju et al [18] proposed a new robust deep temporal clustering algorithm based on Deep Temporal Clustering () to naturally integrate dimensionality reduction and temporal clustering into a single learning structure that is totally unattended. They claim that the clustering layer of their algorithm can be adjusted to any temporal similarity metric, and compares several similarity metrics and latest generation algorithms. The viability of the algorithm is demonstrated using time-series data from several domains, ranging from earthquakes to spacecraft sensor data. Despite the importance of this study, due to the characteristic memory usage of Deep Learning solutions in the training process, its application in the specific context of this work is not feasible.
Finally, it is important to mention the work of Aghabozorgi et al [4]. The authors make a rather complete survey proposing a categorization of the main components that characterize the task of grouping a time series. Despite the completeness of this and other studies [5, 6], the focus of the authors often was on the efficiency and complexity of the approaches in the context of big data and cloud computing. Our context is different because we seek to solve a time series prediction problem with severe processing and storage limitations.
3 Temporal Clustering Algorithm
The receives as input a list of time series and composes an event density prototype, , which represents the average behavior of these series. This prototype can be represented computationally from a vector of positions, where an integer value is assigned to each position, , of the vector, by measuring the amount of events occurring at a given moment (determined by the position of the vector) in . The density of elements , at each time period , is given by the mean of events occurring at in all the time series.
The agglomerative behavior of the is inspired by the City Clustering Algorithm (CCA) [9, 10], a spatial agglomeration algorithm widely used in defining cities beyond their boundaries. The aggregates spatial units by considering two parameters, one of which is a distance threshold and the other a threshold of population density. The relationship between the and the is precisely in the use of these thresholds, however, the uses them in only one dimension (temporal dimension).
The makes use of two thresholds. The first, , a time threshold, represents the distance between the elements of the time series to consider them as temporally contiguous, more precisely, all events that are at temporal distances smaller than are grouped together. The second, , is an event density threshold, used to consider the agglomeration of low density elements in the series, defined according to the user’s preference. The density of events is located at each index of the vector, if , then the index is considered populated, and consequently, it can be grouped. From the definition of the values of and , the behavior of the in the process of identifying a cluster can be observed schematically in Figure 1.
The still uses a percolation model [19] to choose the threshold value . The size of the largest cluster is measured while the threshold is in search of a phase transition. More precisely, it ensures the moment when the largest cluster aggregates the second largest cluster, which is said to be the critical point of the system. The threshold value should be chosen at the moment before the phase transition, at which time the clusters are consolidated and the minimum increase in the value of causes them to come together.
By evaluating the density of water withdrawals from the electric water cooler and dispenser, it is possible to define three modes of operation for the : Comfort, Balance, and Eco. The Comfort mode prioritizes the quality of the experience, while the Eco mode gives priority to the energy saved, finally, the Balance mode aims at a balance between comfort and economy. The values of can be associated with the proposed modes of operation from the observation that the time series distribution used in the evaluation is exponential (). From the analysis of real and synthetic data on water consumption patterns (more details in 5), we propose: for Eco mode, for Balance mode and for Comfort mode.
Algorithm 1 illustrates steps of the using as input a list of temporary series, the period in minutes (which corresponds to the time of each ) and the operation mode, defined by the user as a function of their choice whether to prioritize the lowest power consumption or to drink cold water more frequently. The prototype, , is constructed at lines 2 and 3. From lines 4 to 9 it is checked which value should be chosen as a function of the mode used as a parameter. The percolation process is performed by the loop that starts at line 10 and ends at line 17. The detection of clusters from a fixed value of and is performed between lines 11 and 15. It is noteworthy that has a dynamic behavior, essentially when applied to problems of forecasting times of use of home appliances, where the user can use the parameter to define configurations that prioritize the energy saving to the detriment of the accuracy in the forecast of use. We will show that from the verification that the usage data of the appliance follow a certain distribution, it is possible to define modes of operation for the equipment. Specifically, we show how the values should be associated to three modes of operation: Comfort, Balance, and Eco, where the last prioritizes the energy saving, the first the comfort (higher rates of accuracy in forecasts) and the second a balance between comfort and energy saving. In Section 5 we will discuss the impact of the values on the energy savings provided by the .
Due to the need to store the vector elements saving the time series data with and the clusters found, the spatial complexity of the for the fixed threshold value is , where is the number of elements in the time series and is the number of clusters found. With respect to temporal complexity, we have a complexity , since it will be necessary to go through all the elements of the prototype of density of events, , only once to find the clusters.
Considering the percolation model, the spatial complexity of the remains as , since with each increase in it is not necessary to store the agglomerates found. Only two variables will be added at spatial cost, which will store the value of the largest jump in the size of the largest cluster and the moment of the phase transition. With respect to time complexity, we have an increase for , where refers to the amount of increases that will be made in during the execution of the percolation model.
4 Benchmark
Three datasets were used to perform accuracy tests of TCA and three other reference algorithms in terms of their respective energy consumptions, in per day. The datasets inform the times of the day when there was withdrawal of water. The first dataset was obtained by monitoring the water consumption from a test electric water dispenser in a corporate environment. In the environment in question, 16 employees consumed water from the appliance over 5 business days, specifically from 06 to 12 December (days 09 and 10 were excluded because they were not working days). In all, 301 water withdrawals were carried out from the appliance, with an average of 60.2 withdrawals per day. It is noteworthy that, because it was test equipment, there was an agreement between the factory that produced the electric water dispenser (with the inserted ) and the company where the appliance could be tested for only 5 days. For this reason, only five-day real data was used in our test.
The remaining datasets, a commercial synthetic data () and another, a residential synthetic data (), were generated following an exponential distribution [20] to simulate the rate of use of the electric water dispenser in a corporate and domestic environment, respectively. Synthetic data was generated through a value-generating function that follows an exponential distribution and such values represent the time difference between each consecutive event. The function of the probability density of an exponential depends just on the average of the random variable, which in this case is the time difference between drinks. In choosing the average to be used in the data generating function, we took into account the division of the day into shifts and the application of different averages for each shift according to the pattern of use expected during the day. For example, in a corporate environment the most intensive use of withdrawals is expected during the commercial period with longer breaks between the lunch period while in a residential setting the most constant use is expected throughout the day but with less frequency.
For the application of the , the data sets were treated so as to accumulate the amount of water withdrawals () in ten-minute intervals () in Algorithm 1, that is, for each day there are 144 observations, where each observation is associated with the number of withdrawals of water. The other algorithms used in the comparisons perform their processing with the original data of the time of drinks.
In the specific case of clustering algorithms, what will be interpreted is when the found agglomerations determine the moments when the water of the appliance should be ice cold. In this context, the accuracy of the algorithm will be increased whenever an event of a test time series occurs within one of the identified clusters, in a complementary way, whenever an event occurs outside the limits of one of the clusters, the error rate of the algorithm will be increased. In other words, the error rate is the percentage of water requests that were made in periods that were outside some cluster found by the algorithm. At the same time, the energy consumption will be measured by the time in which it will be necessary to maintain the compressor of the connected electric water dispenser. In all the tests the compressor will be connected within the estimated time clusters and will be adding per hour while the compressor remains turned on. In order to help the monitoring of the functioning of , we have developed a simulator in , version 3.6, which allows to visualize instantly the energy consumption and the error rate. The simulator helps also to visualize the clusters found by TCA. The details of the simulator are described in the supplementary material.
In our tests, in addition to the , the standard electric water dispenser performance algorithm, herein called Conventional, was used, and the K-Means and Expectation- Maximization (EM) algorithms were used. The Conventional algorithm does not save energy, it keeps the water in the appliance chilled all the time. K-Means is a clustering algorithm that agglomerates observations (in the case of this work each observation is a time of day in which a drink was taken), receiving as a parameter the quantity of clusters that one wants to find [21]. Considering that there are four shifts in one day, was used in the tests of this algorithm. Finally, is an iterative algorithm to find maximum likelihood estimates of parameters in statistical models, where the model depends on unobserved latent variables [22]. It should be noted that the is an innovative algorithm that proposes to solve a very specific problem, considering the user’s preference, so it is important to find algorithms that can be compared with it in all its functions. In this section, our main objective is to show that the proposed algorithm has a similar accuracy to some classic clustering algorithms. When comparing with the Conventional mode, our goal is to show how much energy can be saved by adding some intelligence to the compressor behavior of the electric water dispenser.
Due to the memory limitation imposed by the application context of the algorithm, in all scenarios, to conduct the accuracy tests, two days of training data and another day of test data were used, separating the days into intervals of ten minutes, that is, was used as input in the clustering module (test varying and the number of time series put into the are present in the supplementary material). For the other algorithms, the standard of two of training and one of test are also used, however, they perform the clustering process using the original data of the water withdrawal times, therefore, each sample represents the water withdrawal moments. In all tests a cross validation was performed, using real and synthetic data, combining any two days of training with another test day.
For the , in all the tests performed in this section, we assume . In this configuration the algorithm assumes a mode of operation where comfort is prioritized, which in the context of the applied scenario of this work means that the possibility of the user to consume unrefrigerated water will be minimized.
Table 1 shows the result of the comparison. In all the studied algorithms it is possible to observe the memory consumption, energy, and the error rate. The memory consumption was estimated according to the input of each of the algorithms. In the case of the this input is fixed, there are two vectors of 144 bytes that added to the control variables up to 300 bytes. In regard to and K-Means, in the worst case, they can receive infinite entries in one day. A more realistic view would be to assume the existence of an event per minute as the worst case. Thus, by adding two vectors of 1440 positions, plus control variables, the use of memory at bytes was considered. In relation to energy consumption, even in its least economical mode, the had similar consumption to K-Means () and algorithms, however, the had less error in almost all the tests. Although has consumed less energy, the may sacrifice some of its accuracy to decrease consumption. Section 5 looks at the consequences of changing the value of .
5 TCA and the User Preference
The obtained promising success rates using the least amount of memory among all the algorithms tested. It is worth noting that, considering the restrictions imposed by the application scenario, it was the only algorithm capable of using less than 512 bytes of memory, allowing it to be used by some of the cheapest microcontrollers on the market.
Although these results alone are presented as encouraging, the great differential of the lies in its dynamic behavior, where, according to the user’s preference, the algorithm can prioritize energy savings, by not aggregating time slots where consumption is below the threshold, or maximize the quality of user experience, by considering the aggregation of time slots where water consumption is very low. In other words, by manipulating the parameter the user can inform the algorithm of what s/he considers as a low density of temporal events (e.g., water withdrawals from a water dispenser), and consequently define certain moments of the day where, due to the low frequency of use of the appliance, it is acceptable to drink water at a higher temperature in order to save energy. Figure 2(a) illustrates the error rate behavior and energy consumption by varying the parameter in an experiment with the real data. It is observed that, as the value of increases, the energy consumption decreases and the error rate increases.
It is also possible to observe the relationship between the parameter and the quality of the user’s experience, measured indirectly by the average temperature of the water consumed. In Figure 2(b) it is observed that the increase of occurs at the cost of consumption at a higher average temperature, revealing a relation between and the quality of the experience.
Therefore, the proposal is to use for Eco mode, for Balance mode and for Comfort mode. In our tests we observed that for the error rates remain low, always below 8% and decreasing by approximately 1 in relation to (in the scenario illustrated in Figure 2(a) 7.2% of error rate was obtained with 8.9 of consumption in Balance mode). For the Eco mode (), the error rates remained below 20%, with up to 2.5 less consumption than Balance mode ().
Table 2 shows the mean error rate and mean energy consumption in tests with real and synthetic data. It is observed that the change in operating mode has a direct impact on energy savings, measured in . Similarly, it is observed that the lower the average consumption, the higher the average error rate.
In order to evaluate the effectiveness of our approach, we have conducted new tests (pilot study), during one month, with a dispenser prototype in a corporative environment. All the tests were made with the prototype in the mode Comfort. The results from the tests with the prototype have shown that the average consumption of the prototype using is . This means a 32% of economy compared with an equipment without . During this period in 90% of the times that the user accessed the water cooler, the water was cooled in the right temperature.
6 Conclusion
This paper presents a new algorithm for Temporal Clustering, known as the Temporal Clustering Algorithm. The behavior of the algorithm in question is inspired by the City Clustering Algorithm [9, 10], an algorithm widely used in problems with definitions of city boundaries. The great differential of the is in its ability to combine significant rates and prediction with a dynamic behavior, where it is possible to be configured according to the user’s preference. It was also shown that the algorithm uses little memory, making it possible to incorporate it into low-cost circuits, which are constantly used in applications of the Internet of Things. Real and synthetic data of water consumption of an electric water dispenser was used to evaluate the level of prediction of the . In particular, the performance of the algorithm was verified in function of its ability to predict periods where the appliance is being used, consequently verifying how much energy the algorithm can save by turning off the electric water dispenser compressor at times when it is not in use. A comparison with some of the main algorithms in the literature has shown that the can save energy (even in its least economical configuration), use less memory and obtain greater accuracy.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. D. Lytras, V. Raghavan, E. Damiani, Big data and data analytics research: from metaphors to value space for collective wisdom in human decision making and smart machines, International Journal on Semantic Web and Information Systems (IJSWIS) 13 (1) (2017) 1–10.
- 2[2] B. Guo, D. Zhang, Z. Wang, Z. Yu, X. Zhou, Opportunistic Io T: Exploring the harmonious interaction between human and the Internet of Things , Journal of Network and Computer Applications 36 (6) (2013) 1531 – 1539. doi:https://doi.org/10.1016/j.jnca.2012.12.028 . · doi ↗
- 3[3] L. Atzori, A. Iera, G. Morabito, The Internet of Things: A survey , Computer Networks 54 (15) (2010) 2787 – 2805. doi:https://doi.org/10.1016/j.comnet.2010.05.010 . · doi ↗
- 4[4] S. Aghabozorgi, A. S. Shirkhorshidi, T. Y. Wah, Time-series clustering – a decade review , Information Systems 53 (2015) 16 – 38. doi:https://doi.org/10.1016/j.is.2015.04.007 . · doi ↗
- 5[5] F. Zhou, F. D. l. Torre, J. K. Hodgins, Hierarchical aligned cluster analysis for temporal clustering of human motion, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (3) (2013) 582–596. doi:10.1109/TPAMI.2012.137 . · doi ↗
- 6[6] J. M. Quero, M. D. Ruiz Lozano, J. A. Castañeda García, M. A. Rodriguez Molina, D. M. Frias Jamilena, A dynamic fuzzy temporal clustering for imprecise location streams , International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 25 (03) (2017) 409–426. ar Xiv:https://doi.org/10.1142/S 0218488517500179 , doi:10.1142/S 0218488517500179 . · doi ↗
- 7[7] Y. Zhang, N. Suda, L. Lai, V. Chandra, Hello edge: Keyword spotting on microcontrollers, ar Xiv preprint ar Xiv:1711.07128.
- 8[8] P. J. B. Tan, M.-H. Hsu, Designing a system for English evaluation and teaching devices: A PZB and TAM model analysis , Eurasia Journal of Mathematics, Science and Technology Education 14 (6) (2018) 2107–2119. doi:10.29333/ejmste/86467 . · doi ↗
