Privacy-Preserving Electricity Theft Detection based on Blockchain
Zhiqiang Zhao, Yining Liu, Zhixin Zeng, Zhixiong Chen, Huiyu Zhou

TL;DR
This paper introduces a blockchain-based, privacy-preserving electricity theft detection scheme that leverages functional encryption and LSTM models to enhance security and detection accuracy without relying on a trusted third party.
Contribution
It presents a novel scheme combining blockchain, functional encryption, and machine learning for secure, privacy-preserving electricity theft detection without a trusted third party.
Findings
More accurate theft detection in real environment
Resists various security attacks
Maintains acceptable communication and computational overhead
Abstract
In most electricity theft detection schemes, consumers' power consumption data is directly input into the detection center. Although it is valid in detecting the theft of consumers, the privacy of all consumers is at risk unless the detection center is assumed to be trusted. In fact, it is impractical. Moreover, existing schemes may result in some security problems, such as the collusion attack due to the presence of a trusted third party, and malicious data tampering caused by the system operator (SO) being attacked. Aiming at the problems above, we propose a blockchain-based privacy-preserving electricity theft detection scheme without a third party. Specifically, the proposed scheme uses an improved functional encryption scheme to enable electricity theft detection and load monitoring while preserving consumers' privacy; distributed storage of consumers' data with blockchain to…
| Notation | Description |
| Power supply data for a residential area | |
| Total of uploaded data for all SMs | |
| i-th smart meter | |
| Encrypted reading of at time | |
| The set of SMs in the detection region | |
| The first layer’s weights of the model | |
| Electricity theft detection period | |
| Timestamp of | |
| Signature of | |
| A decryption key for aggregating readings | |
| Decryption keys for electricity theft detection | |
| Number of residential areas | |
| Number of smart meters in the detection area | |
| Number of readings for electricity theft detection period |
| Layer(type) | No. of neurons | No. of parameters | AF |
| dense(Dense) | 10 | 20 | tanh |
| lstm(LSTM) | 300 | 373200 | tanh,sigmoid |
| lstm-1(LSTM) | 300 | 721200 | tanh,sigmoid |
| dense-1(Dense) | 2 | 602 | softmax |
| Model | DR() | FA() | HD() | Accuracy() | Model detection overhead | Communication overhead |
| Our model | 93.72 | 2.62 | 91.10 | 95.56 | 56.03 ms | 600 Bytes |
| ETDFE [12] | 92.56 | 5.84 | 86.72 | 93.36 | 1.94 seconds | 40 Bytes |
| PPETD MD1 [15] | 91.50 | 7.40 | 84.10 | 91.80 | 48 minutes | 1900 MB |
| PPETD MD2 [15] | 90.00 | 8.79 | 81.2 | 90.20 | 39 minutes | 1675 MB |
| PPETD MD3 [15] | 88.60 | 3.90 | 84.60 | 90.30 | 35 minutes | 1375 MB |
| Jokar et al[7] | 94.00 | 11.0 | 83.0 | - | - | - |
| Notations | Description | Time (ms) |
| Time cost of encryption | 0.096 | |
| Time cost of aggregating 200 readings | 2.21 | |
| Time cost of decrypting aggregated readings | 0.135 | |
| Time cost of public key generation | 45.36 | |
| Time cost of decrypting to obtain | 49.63 | |
| Time cost of generating timestamp | 0.852 | |
| Time cost of signature operation | 13.18 | |
| Time cost of the verify signature operation | 127.29 | |
| Time cost of model detection | 56.03 |
| Stages | Scenario 1 | Scenario 2 | |
| Pre-data transmission | Attack method | Hack into SMs | Hack into SMs; Get the keys |
| Probability | |||
| Data in transit | Attack method | Hack channels | Hack channels; Get the keys |
| Probability | |||
| Data received | Attack method | Hack into MN | Hack into SMs; Get the keys |
| Probability | |||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Privacy-Preserving Electricity Theft Detection based on Blockchain
Zhiqiang Zhao, Yining Liu, Zhixin Zeng, Zhixiong Chen, Huiyu Zhou Manuscript received July 13, 2022; revised September 14, 2022 and December 27, 2022; accepted February 5, 2023. This work was supported by Natural Science Foundation of China (No. 62072133) and the Fujian Key Laboratory of Financial Information Processing (Putian University) (No. JXC202201). (Corresponding author: Yining Liu.)Zhiqiang Zhao, Yining Liu and Zhixin Zeng are with School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China (e-mail: [email protected], [email protected] and [email protected]).Zhixiong Chen is with the Fujian Key Laboratory of Financial Information Processing and the Key Laboratory of Applied Mathematics of Fujian Province University, Putian University, Putian, Fujian 351100, China (e-mail: [email protected]).Huiyu Zhou is with the School of Computing and Mathematical Sciences, University of Leicester, LE1 7RH Leicester, U.K. (e-mail: [email protected]).
Abstract
In most electricity theft detection schemes, consumers’ power consumption data is directly input into the detection center. Although it is valid in detecting the theft of consumers, the privacy of all consumers is at risk unless the detection center is assumed to be trusted. In fact, it is impractical. Moreover, existing schemes may result in some security problems, such as the collusion attack due to the presence of a trusted third party, and malicious data tampering caused by the system operator (SO) being attacked. Aiming at the problems above, we propose a blockchain-based privacy-preserving electricity theft detection scheme without a third party. Specifically, the proposed scheme uses an improved functional encryption scheme to enable electricity theft detection and load monitoring while preserving consumers’ privacy; distributed storage of consumers’ data with blockchain to resolve security problems such as data tampering, etc. Meanwhile, we build a long short-term memory network (LSTM) model to perform higher accuracy for electricity theft detection. The proposed scheme is evaluated in a real environment, and the results show that it is more accurate in electricity theft detection within acceptable communication and computational overhead. Our system analysis demonstrates that the proposed scheme can resist various security attacks and preserve consumers’ privacy.
Index Terms:
Privacy preservation, electricity theft detection, smart grid, blockchain, long short-term memory network (LSTM).
I Introduction
Smart grid (SG) is an advanced grid integrating smart technology, which uses smart meters (SMs) to collect, analyze and process fine-grained power consumption data from consumers to manage energy effectively [1]. While the smart grid brings convenience, it also brings serious challenges[2]. For one thing, the communication of the smart grid is exposed to potential malicious attacks, such as data tampering attack and false data injection. If these malicious attacks cannot be resisted, the smart grid will be unable to operate normally [3]. For another thing, electricity theft has become a widespread phenomenon in the smart grid. Annual economic losses due to electricity theft are estimated to be about 170 million dollars in the United Kingdom [4] and 6 billion dollars in the United States[5]. Meanwhile, electricity theft can also seriously affect energy management and endanger the normal operation of the smart grid [6].
Since the smart grid has access to consumers’ fine-grained power consumption data, the traditional machine learning model[7] and deep learning model [8] based on big data have achieved good performance. However, directly giving fine-grained power consumption data of consumers to the SO raises serious privacy issues [9]. Meanwhile, as the security and privacy of data are becoming more and more concerned, related laws and regulations have been proposed, such as the General Data Protection Regulations (GDPR) in Europe, and the utilities’ disregard for privacy aspects could lead to strong consumer objection and significant curtailment of service deployment[10]. Therefore, there is an urgent demand for a privacy-preserving electricity theft detection scheme.
Although existing schemes are beginning to consider the privacy of consumers’ power consumption data during the electricity theft detection process, most schemes have serious challenges. On the one hand, a serious threat is the potential leakage of consumers’ privacy due to the presence of a trusted third party. In [11], the model requires a fully trusted third party to perform the detection using the original consumers’ data. However, it is difficult to guarantee that the third party is fully trustworthy in reality, so consumers’ privacy is still at risk of being compromised. In [12], the scheme requires a fully trusted key distribution center. However, once the SO colludes with the key distribution center, then the SO can get the consumers’ raw power consumption data, which leads to consumers’ privacy leakage. On the other hand, the security of data and smart grid is not considered. In [11], if the trustworthy detection center is maliciously attacked, then it will possibly lead to malicious data tampering. In [12], this scheme does not verify the legitimacy of the transmitted data, so it is unable to resist data tampering and forgery attacks. The existing schemes do not consider the security of the smart grid in operation and data tampering due to centralized data storage when performing electricity theft detection, thus making it impossible to achieve electricity theft detection. Therefore, how to accomplish the security of smart grid operations and consumers’ privacy while utilizing consumers’ power consumption data is a major challenge of current research.
In this paper, we aim to achieve more secure electricity theft detection and load monitoring without the involvement of a third party. The main contributions of this work are threefold:
We propose a blockchain-based electricity theft detection scheme, which uses the distributed storage of blockchain to solve security problems such as data tampering of centralized storage, etc. 2. 2.
We improve the functional encryption scheme[12] to enable privacy-preserving electricity theft detection and load monitoring without a trusted key distribution center, which eliminates potential security and privacy problems caused by a third party. 3. 3.
We build an electricity theft detection model based on long short-term memory networks that are more suitable for processing time-series data, and the model parameter settings are analyzed to obtain higher performance.
The remainder of this paper is organized as follows. In Section II, we review the related work. Section III illustrates the related knowledge. In Section IV, we define the system model and design goals. Section V presents the proposed scheme. Experimental results and system characterization are presented in Sections VI and VII, respectively. Finally, the paper is summarized in Section VIII.
II Related Work
In this section, we briefly review recent research work on electricity theft detection schemes in the smart grid and the distributed blockchain-based smart grid framework.
Currently, due to the seriousness of the electricity theft problem and the importance of privacy-preserving, we focus on electricity theft detection schemes with privacy-preserving, which can be broadly classified into two categories with or without the participation of a third party. The comparison of the related work is given in Table I.
In the case of schemes where a third party is involved, the third party is used to perform tasks such as distributing keys and performing electricity theft detection, etc. Wen et al.[13] proposed a privacy-preserving federal learning framework consisting of a data center, a control center, and multiple detection stations, which requires a high cost to complete the system. Moreover, the authors’ scheme does not consider other functional requirements such as load monitoring. Yao et al.[11] proposed to send the encrypted data of SMs to a fully trusted detection center to decrypt and then detect using the convolutional Neural Network (CNN) model, meanwhile, SMs send the encrypted data to an untrusted center that aggregates power consumption data for load monitoring. In [12], Ibrahem et al. proposed to use functional encryption and the feed-forward neural network (FNN) to perform electricity theft detection and privacy protection under the condition that the key distribution center is fully trusted. All of the above schemes assume that the third party is trustworthy, but in practice, consumers’ privacy can still be compromised such as once the third party colludes with other entities. Untrustworthy third parties have caused the above-mentioned problems in other areas as well[16], so it is important to eliminate the risks associated with the presence of untrustworthy third parties.
In schemes where no third party is required, the scheme is performed by only two entities, SM and SO. Joker et al.[7] proposed to use the support vector machine (SVM) to monitor consumption pattern anomalies and identify suspicious consumers in the case of low sampling of consumers’ power consumption data. However, this scheme is difficult to resist malicious attacks, such as replay attacks, fake data injection, etc. In [14], the Euclidean distance between the normalized photovoltaic power output of any two installations in the region in a day is calculated by homomorphic encryption. Then the Euclidean distances are clustered to analyze the anomalous users. However, this scheme detects energy theft from the perspective of energy output, and when a smart meter is tampered with due to external attacks, it can no longer be detected properly. Meanwhile, the authors’ scheme cannot obtain the sum of power consumption in the region for load monitoring. Nabil et al. [15] proposed a CNN machine learning model based on secure two-party computation protocols using arithmetic and binary circuits. This scheme requires high computation and communication overhead to complete the detection of a consumer, which takes at least 35 minutes for detection and a minimum of 1375 MB for communication overhead. None of the above schemes consider the problem of ensuring the operational security of the smart grid when performing electricity theft detection, such as the data tampering problem when SO is maliciously attacked. Therefore, a more secure detection model with acceptable computation and communication overhead is needed.
To ensure the security of the smart grid, in [17], Liang et al. proposed a new distributed blockchain-based protection framework to enhance the self-defense of the modern power system. In [18], the authors designed a blockchain-based platform to prevent user data from being tampered with and proposed a multifaceted mechanism to protect user privacy. In [19], Hamouda et al. proposed a blockchain-based comprehensive transactive energy market framework that enables a safer and fairer electricity market. Fan et al.[20] proposed a decentralized privacy-preserving data aggregation scheme for the smart grid based on blockchain, which uses the Paillier cryptographic algorithm to aggregate consumers’ power consumption data. In [21], the authors proposed a new blockchain-based strategy for inter-connected microgrids energy trading that enhances the security and transparency of the platform. In[22], the authors proposed an efficient and robust blockchain-based multidimensional data aggregation scheme in the smart grid to resist more internal and external attacks. Chen et al. [23] proposed a blockchain-based framework to prevent energy market failures caused by dishonest participants.
There are many recent studies that consider the distributed blockchain-based smart grid framework can secure the grid. Meanwhile, it is also a good solution for electricity theft detection, and to advance the state of the art, we propose a blockchain-based privacy-preserving electricity theft detection scheme, which will be further explained and evaluated in the following sections.
III Preliminaries
III-A Secure Aggregation
Bonawitz et al. [24] proposed a secure aggregation scheme where the server can only see the gradient after the aggregation is completed and cannot know the private true gradient value of each user. Unlike the original text, the proposed scheme uses the elliptic curve Diffie-Hellman key agreement. The steps of secure aggregation are as follows:
III-A1 Key agreement between arbitrary SMs
Each negotiates key masks with each other.
- •
KA.Setup: The setup algorithm takes as input the security parameters . Then it outputs cyclic additive group of prime order , a basis point , a hash function , and an elliptic curve on as well as a large prime number .
- •
KA.Gen: Each user chooses a random as own secret key and calculates as the public key .
- •
KA.Agree: After receiving the public key from user , user uses its own secret key to generate .
III-A2 Generating masks for aggregation
A mask is generated by key agreement between arbitrary users. Assume that all users form a user set in order and each user computes:
[TABLE]
where represents the users whose serial number is less than , by the same token, we can get .
Each user sends to the server, the server computes Eq. (2) to securely aggregate the secret keys.
[TABLE]
III-B Boneh-Lynn-Shacham Short Signature
Boneh-Lynn-Shacham (BLS) short signature [25] is a signature algorithm that enables signature aggregation and speeds up block verification, which is divided into three phases: key generation, signature, and verification.
Key generation: Sampling random number as the private key and calculating the public key . 2. 2.
Signature: The message is mapped to a point in the cyclic group . Generating signature . 3. 3.
Verification: If , where is a bilinear map, then the signature is verified. Otherwise fails.
IV System Model and Design Goals
This section focuses on the construction of the system model and threat model as well as describes our design goals.
IV-A System Model
As shown in Fig. 1, the model of our system scheme includes smart meters in the residential area (RA), a system operator, and distribution transformer meters (DTMs). The function of each entity is described below.
Smart meter: SM is an electricity meter that sends the consumer’s power consumption data to the mining node (MN) periodically (e.g., every 30 minutes) after implementing a predefined privacy-preserving scheme. 2. 2.
Mining node: The MN is a smart meter selected by the votes of all SMs in each residential area, it is responsible for verifying the legitimacy of the data, aggregating the encrypted data reported by SMs, and creating blocks to record power consumption data. If the MN goes down, all SMs will continue to vote for a new MN. If a malicious SM wants to become an MN, it needs to control at least of the SMs in the entire network to be elected as MN, but this is unrealistic. 3. 3.
System operator: The SO can generate system parameters and read the consumers’ encrypted power consumption data through blockchain as well as get the real-time total power consumption of the area sent by MN, which are used for power consumption analysis and energy management. The SO uses a distribution transformer meter to record the total power supply data for the residential area during the electricity theft detection period in order to judge the existence of electricity theft and perform electricity theft detection.
IV-B Threat Model
For the system model proposed in the previous sub-section, we consider the threat from three aspects: consumers, the SO, and external attackers.
Consumers: Malicious consumers may falsify their power consumption data to reduce their bills. Also, they may collude with other consumers or SO to infer sensitive information about the victimized consumers. In addition, malicious consumers may deny their transmitted data when they are detected. With respect to MN, it may maliciously tamper the data reported by SMs. 2. 2.
SO: The SO is assumed to be honest but curious, i.e., it performs operations according to the protocol, but it may attempt to obtain fine-grained power consumption data from consumers to analyze valuable information. 3. 3.
External attackers: External attackers may attempt to eavesdrop on consumer communications to obtain consumer data, and may also forge malicious data to harm the SO, as well as initiate attacks on the SO to tamper with stored data.
Therefore, the scheme aims to achieve the smart grid can resist malicious attacks and preserve consumers’ privacy while still enabling energy management and electricity theft detection.
IV-C Design Goals
In order to protect the security and privacy of consumers’ data without relying on a third party, the proposed scheme should achieve the following design goals:
Privacy preservation: For any one consumer, their original power consumption data is not obtainable by DTM, SO, and other consumers. Meanwhile, no entity can infer any private information from the encrypted data. 2. 2.
Confidentiality: Consumers’ data is encrypted for transmission, storage, aggregation, and theft detection so that the original consumers’ data cannot be recovered even if entities collude with each other. 3. 3.
Data unforgeability and non-repudiation: The consumers’ encrypted data is signed and then transmitted to ensure that the data cannot be forged, while the transmission information is recorded in the blockchain to achieve data non-repudiation and data unforgeability. 4. 4.
Resist collusive attacks: The proposed scheme can resist the attack that smart grid entities collude with each other to obtain consumers’ power consumption data.
V The Proposed Scheme
Our scheme consists of five phases: (1) system initialization phase; (2) reporting phase; (3) aggregation phase; (4) judgement phase; (5) electricity theft detection phase. The notations are listed in Table II.
V-A Overview
The main process of our scheme is summarized as follows:
- •
In the initialization phase, SO divides the residential area and generates the system parameters as well as parameters of the first layer of the neural network. The SMs in each detection region select the MN by Byzantine fault-tolerant consensus mechanism[26], while the SM generates encryption and decryption keys.
- •
In the reporting phase, each SM encrypts the power consumption data during the detection period , then signs and sends encrypted data to the MN.
- •
In the aggregation phase, MN verifies the legitimacy of the data, then constructs blocks and aggregates the power consumption data through the Merkle tree.
- •
In the judgement phase, SO judges whether there is electricity theft in a region based on the difference between the DTM statistics and the aggregated data of MN within the tolerance range.
- •
In the electricity theft detection phase, SO reads the encrypted data from the blockchain that is reported by each SM in the suspected electricity theft area during the theft detection period. The encrypted data are decrypted (still in ciphertext state after decryption) and then fed to the detection model to identify the electricity theft consumers.
V-B System Initialization
System initialization includes three parts. First, SO generates the parameters of the system and the first layer’s weights of the model, and delineates residential areas with SMs in each detection area. Second, all SMs in the region reach consensus to choose the MN. Third, Each SM generates its own keys.
V-B1 System parameters generation
- •
Step 1: The SO generate where and are two cyclic additive groups of prime order , is a generator of .
- •
Step 2: The SO generates where is a cyclic additive group of prime order and generator based on elliptic curves.
- •
Step 3: The SO chooses a full-domain hash function and a hash function .
- •
Step 4: The SO publishes public parameters .
V-B2 The first layer’s weights of the model
The SO trains the electricity theft detection model based on historical honest and malicious consumers’ power consumption data, and then saves the weight of the first layer of the network, the weight can be represented as:
[TABLE]
where is the number of power reporting in the electricity theft detection period and is the number of neurons in the first layer of the neural network, should be fewer than the number of inputs , because if , the SO will calculate the consumers’ fine-grained power consumption data, since unknowns in equations may be solved to obtain the data.
V-B3 Key Generation
SM generates the encryption keys and decryption keys. All SMs cooperate using the secure aggregation algorithm to generate a decryption public key for aggregating the power consumption readings of all SMs. Meanwhile, each SM generates electricity theft detection public keys .
- •
Secret key generation: selects a random number as the secret key for signing and key negotiation and selects as the secret key for encryption.
- •
Generation of : Arbitrary SMs negotiate key masks among themselves and the MN performs secure aggregation to generate decryption public keys .
Step 1: Each calculates and publishes the public key .
Step 2: Each receives the public keys of other SMs and then calculates . Fig. 2 shows an example of four SMs performing key masks agreement and generating DA.
Step 3: Each calculates and sends the results to MN for aggregation by Eq. (3).
[TABLE]
Step 4: MN aggregates its own and the results sent by other SMs, as shown in Eq. (4).
[TABLE]
- •
Generation of : SO publishes the weights of the first layer network of the electricity theft detection model to each , and each generates decryption public keys to enable theft detection without obtaining the original power consumption data.
Step 1: Each generates a timestamp of the current detection time by Eq. (5).
[TABLE]
Step 2: Each generates decryption public keys by Eq. (6):
[TABLE]
Step 3: Each generates decryption public keys by Eq. (7).
[TABLE]
V-C Reporting Phase
In the reporting phase, each encrypts its power consumption readings and then performs signature operations.
- •
Step 1: For each electricity theft detection period , each encrypts its power consumption readings by Eq. (8).
[TABLE]
- •
Step 2: Each computes the public key and then generates the BLS short signature by Eq. (9), is the current timestamp to prevent replay attack.
[TABLE]
- •
Step 3: Each sends to MN. The data within each consists of basic storage information and primary transmission data, as shown in Fig. 3.
V-D Aggregating Phase
Efficient message propagation methods are important building blocks for various networks[27], and in the proposed scheme, the SM sends messages directly to the MN, which is responsible for broadcasting and aggregating the total area power consumption. After receiving the data from the SMs, first, the in the residential area verifies the signature and timestamp. After the verification is passed, generates the Merkle tree and then creates the block through the Byzantine fault-tolerant consensus mechanism, the block head stores the timestamp, the hash of the previous block, and the Merkle tree root hash, and the block body stores the encrypted data and decryption Keys. After that, aggregates the ciphertext and decrypts it to get the total power consumption of all SMs at the current time. Fig. 4 shows the blockchain structure of the proposed scheme. The detailed steps are as follows:
- •
Step 1: verifies signature and timestamp. If Eq. (10) and are valid, the verification passes and fails otherwise. To make verification more efficient, can perform batch verification.
[TABLE]
- •
Step 2: performs the hash operation to generate the Merkle tree root hash value. Then generates a new block , and broadcasts the block to other SMs in the residential area .
- •
Step 3: After receiving the block, SMs verify the block’s hash value, timestamp, and data, then send the result of the verification to other SMs to achieve mutual supervision among SMs.
- •
Step 4: SMs send their own check results to . collects feedback from all SMs and checks them. If all SMs agree on the legitimacy and integrity of the block, adds the block to the blockchain in chronological order and sends the block to other SMs. If there exists SM disagrees with the check result, checks the feedback information and sends the block to this SM again for a second check.
- •
Step 5: aggregates the encrypted data of all SMs and decrypts it to get the total power consumption of the area at the current time by Eq. (11).
Since is not a very large value, there are many ways to calculate the aggregated value, such as Shank’s baby-step giant-step algorithm[28].
[TABLE]
V-E Judgement Phase
To achieve efficient detection, our solution will perform electricity theft detection after discriminating whether there is electricity theft in residential areas.
For each residential area, the transformer meter measures the total amount of electricity supplied to that residential area during the electricity theft detection period, . Meanwhile, the MN aggregates the readings uploaded by all SMs in the residential area, , the SO determines whether there is electricity theft by Eq. (12):
[TABLE]
where is the technical loss (TL) in transmission lines within the residential area and is the calculation error for TL. The SO can use historical data to analyze the technical loss, while many methods exist [7] to calculate the technical loss. If Eq. (12) is valid, SO considers that there is electricity theft in the current area. Afterward SO reads the power consumption data uploaded by each in the blockchain for electricity theft detection.
V-F Electricity Theft Detection Phase
In this sub-section, a privacy-preserving electricity theft detection model is presented in the proposed scheme, and then we explain the experimental settings, including computing platforms, dataset, and data pre-processing.
V-F1 Privacy-preserving Electricity Theft Detection Model
As shown in Fig. 5, our model is composed of the fully connected layer and long short-term memory networks. The core operation of the fully connected layer is the multiplication of a matrix and a vector, which can be expressed as . More detailed representations are:
[TABLE]
where is the input vector, is the weight matrix, and then is the bias vector is added. This operation can be seen as an inner product of the input vector and each column of the weight matrix . It can also be viewed as a group of -equations, where the input vector are the unknowns and the weight matrix are the coefficients, and since is less than , the input vector cannot be solved.
Therefore, in order to perform electricity theft detection in the ciphertext state of the consumers’ power consumption data, the result of the inner product of consumers’ power consumption data and each column of the weight matrix is obtained by Eq. (14).
[TABLE]
The output of the fully connected layer is obtained by calculating the inner product of each column of the weight matrix with the consumer power consumption data, and then adding the bias vector as follows:
After SO gets the output result of the fully connected layer, it still cannot solve the original consumers’ power consumption data, and the consumers’ power consumption data is input to the next layer of the network in the encrypted state, finally, the detection result is inferred after layer-by-layer computation.
The detection model uses categorical cross-entropy as the loss function. In the model training phase, we use the RMSprop optimizer to train the model for 30 epochs, 512 batch sizes, and 0.001 learning rate. To prevent overfitting, we use the kernel regularizer in the LSTM layer, and at the same time the callback function ReduceLROnPlateau in the Keras framework[29] is used to dynamically reduce the learning rate, and the callback function EarlyStopping is used to obtain the optimal model. The parameters of our model structure are summarized in Table III, where AF stands for activation function.
V-F2 Computing Platforms
In our experiments, we build a Tensorflow virtual environment on a server with unbutu 18.04.6 LTS system and NVIDIA Tesla T4 GPU as well as use the Keras framework to train and evaluate the model.
V-F3 Dataset
We use the dataset from the Irish Smart Energy Trials [30], which contains the power consumption data of more than 1000 consumers in 535 days from 2009 to 2010, and fine-grained power consumption data is reported by each SM every 30 minutes.
V-F4 Data Pre-processing
We select the smart meter data of 200 consumers from the dataset and create one record of the consumer’s power consumption data (48 readings) for one day, with a total of 107,200 records.
Since all the data in the dataset are from honest consumers’ data, we use the electricity theft attack proposed by [7] to generate malicious consumers’ data. Based on the dataset of benign samples, for each sample , we perform the following operations to generate six malicious types of data:
, ; 2. 2.
, ; 3. 3.
; 4. 4.
, ; 5. 5.
; 6. 6.
.
{\gamma_{t}}=\left\{{\begin{array}[]{*{20}{c}}0&{ts<t<te}\\ 1&{else}\end{array}}\right. \begin{array}[]{l}ts=random(0,42)\\ te-ts=random(6,48)\end{array}
Electricity theft attacks generated 643,200 malicious data records. Since the data of honest data records are only 107200, this leads to the problem of unbalanced sample categories of data. Therefore, we apply the adaptive synthetic sampling method (ADASYN) [31] to balance the size of honest and malicious classes. We randomly divide the balanced dataset into a training dataset (80%) and a testing dataset (20%) to perform the training of the model.
VI Performance Evaluation
In this section, at first, our method is compared with other methods that deal with time series to demonstrate the better performance of our method. Then, we study the parameters of our model. Finally, we evaluate the performance of our electricity theft detection model in our test set. Meanwhile, we compare the computation and communication of the model with other schemes.
VI-A Method Comparison
To demonstrate the better performance of our model, the experimental comparison with other methods was performed on a test dataset. Concretely, Deng et al. proposed a tree-ensemble method, referred to as time series forest (TSF), for time series classification [32]. Middlehurst et al. proposed an improved hierarchical vote collective of transformation-based ensembles (HIVE-COTE) for time series classification [33]. Dempster et al. proposed a simple linear classifier based on the random convolution kernels (ROCKET) [34]. Meanwhile, in [15], the authors proposed to use the one-dimensional convolutional neural network (CNN) for electricity theft detection. Table IV gives the experimental results for each method using the same training data set and testing data set, and we see that the LSTM model gets the highest accuracy score of .
VI-B Parameter Study
Various hyper-parameters of the model have an impact on the performance of the model. For our model, what is more important is the time step, which is the number of power readings input in the model. In our model, the time step is the same as the theft detection period. For the theft detection model, increasing the detection period means that the communication overhead of the model will increase, so a reasonable theft detection period must be determined. Therefore, we deeply analyze the impact of these parameters on the performance of our model.
VI-B1 Effect of time steps
Fig. 6 shows the accuracy of the validation set with varying epochs when the time steps are different. We can find out that different time steps affect the accuracy of the model as well as the training time, while the longer the time steps, the longer the theft detection period will be, which will lead to a rise in the overall model in terms of communication overhead. Although the difference in accuracy between time steps 96 and 48 is not significant, the training time is shorter and communication is less when the time steps are 48.
VI-B2 Effect of learning rate
In the model training progress, we use the RMSprop optimizer with a default learning rate . To find the optimal model, we use the callback function ReduceLROnPlateau in the Keras framework, which serves to reduce the learning rate when learning stagnates. As shown in Fig. 6, there is some improvement in accuracy after reducing the learning rate.
VI-B3 Effect of batch size :
Fig. 7 shows the performance of our model by setting the batch size as 512 which gets the highest accuracy with 95.60 while needing more epochs to optimize. The experimental results show that a smaller batch size can speed up the optimization within the same epochs, which suggests that setting the bath size between 256 and 512 is more acceptable.
VI-B4 Effect of neurons
Fig. 8 shows that the highest accuracy is achieved when the number of neurons in the LSTM layer is 300-360. A more number of model neurons represents a slower model inference, so the neurons of our model are set to 300.
VI-C Performance of electricity theft detection model
VI-C1 Performance Metrics
To evaluate our electricity theft detection model, we conduct the experiments by considering four performance metrics: accuracy, the detection rate , and the false acceptance rate as well as the highest difference . Accuracy measures the percentage of correct classifications in the testing dataset. The detection rate measures the percentage of detected malicious consumers in the total malicious consumers. The false acceptance rate measures the percentage of honest consumers who are mistakenly detected as malicious consumers. When , accuracy, and are high and is low, the model performance is better.
[TABLE]
where , and stands for true positive, false positive and true negative, respectively.
[TABLE]
[TABLE]
where is the total number of samples in the testing dataset, is the label for the th consumer, is is the inference result of the model.
VI-C2 Performance Comparison
We obtain the confusion matrix of our model by using the Scikit-learn python library. As shown in Fig. 9, in the confusion matrix of our model, the proportion of consumers who are predicted to be electricity theft consumers among those who are electricity theft consumers is the , the proportion of consumers predicted to be electricity theft as a percentage of those who are truly normal consumers is the .
Table V shows the evaluation results of our model and the existing models with privacy preservation. The proposed scheme is better in terms of , accuracy, and among the schemes considering privacy protection. Our privacy detection model has higher accuracy and , 95.56, and 91.10, respectively. At the same time, the in our model is 2.62, which is lower than other schemes. From the evaluation results, we can demonstrate that the proposed scheme has a better performance. Moreover, the performance of our model is not decreased by the use of encryption compared to [15] because we use the inner product operation of the parameters of the first layer of the model with the consumers’ power consumption data and the operation result in the same output as the direct input to the model.
VI-D Computation and Communication Overhead
To evaluate the proposed scheme in a more realistic environment, we used the Python ”Charm” crypto-graphic library [35] on a Raspberry Pi Zero W device with a 1.0 GHz single-core CPU and 512 MB of RAM. The elliptic curve of size 160 bits (MNT159 curve) was also used.
VI-D1 Communication overhead
In our model, the main communication overhead comes from the SMs transferring to the MN. We use an elliptic curve with 160-bit security level. From Eq. (5) to Eq. (9), it can see that the ciphertext, signature, and public key size are all 40 bytes, the size is 400 bytes, and the timestamp size is 80 bytes, so it takes 600 bytes for the to report one reading. PPETD [15] uses secure multiplication, security evaluation, and garbled circuits to protect the privacy evaluation of the CNN model, which results in a high communication overhead of about 1900 MB per SM. Yao et al.’s scheme [11] requires sending a ciphertext, signature, and timestamp to two institutions to complete the aggregation and detection, and we assume that it generates 2048 bits of ciphertext, 40 bytes of signature, and 40 bytes of timestamp, the total size required is 672 bytes. Richardson et al.[14] and Ibrahem et al.’s scheme [12] only sends 40 bytes, and 256 bytes of ciphertext, respectively. Meanwhile, Fig. 10 gives a comparison of the communication overhead with other schemes. It can be seen that the proposed scheme achieves more security within an acceptable range of communication overhead.
VI-D2 Computation overhead
In the proposed scheme, the computations mainly include three phases: reporting phase, aggregating phase, and electricity theft detection phase. In the reporting phase, the main computation overhead comes from the encryption, signature, decryption keys generation, and timestamp generation operations of the SM, therefore, the total time cost of the reporting phase is 59.488 ms. In the aggregation phase, MN achieves aggregating readings, decrypting, and verifying signatures, the total time cost is 129.635 ms. In the detection phase, the computation cost of decrypting to obtain is 49.63 ms. The computation costs of required functions are listed in Table VI. Experimental results show that this is feasible in a real-world environment.
In model inference speed, the total evaluation time of ETDFE for a 15-layer FFN model with 3,391,634 parameters is about 1.94 seconds and PPETD MD1 takes 48 minutes to evaluate the model, our model has only 1,095,022 parameters and its evaluation time is only 56.03 ms. In addition, the proposed scheme is more efficient compared to other schemes because it performs electricity theft detection after identifying the suspected theft area.
VI-E Blockchain simulations
The block time is a measure of the time it takes for the miners or validators in the network to verify the transactions within a block and generate a new block in that blockchain. Very short block times may lead to abnormal behavior, because nodes may not have enough time to send transactions, and synchronize their transaction pool or blockchain. Very long block time wastes arithmetic power and reduces the security of the system. Therefore, an appropriate block time is important. As shown in Fig. 11, average block time is simulated in the blockchain simulation system[36] for the number of SMs in the detection region from 50 to 300. The block time should be as much as possible less than the period of the SM reporting power consumption readings, and the SO can select the number of SMs in the area based on the reporting period.
VII System Analysis
In this section, we aim to demonstrate that the proposed scheme can achieve the following security and privacy guarantee, while it can resist the attacks in Section IV-B. In addition, to prove that the proposed scheme is more secure than the existing schemes, we perform a comparison of system characteristics.
VII-A Security analysis
Scenario 1: The privacy of consumers’ power consumption cannot be inferred by any attacker.
Proof: The consumers’ fine-grained data is encrypted and sent to the MN. The confidentiality of is achieved by an elliptic curve over finite fields. Specifically, to analyze the consumers’ private information, the attacker needs to crack the consumer’s continuous long-term encrypted data , but the attacker can only get the public parameters, which is infeasible in cracking the computation. In the electricity theft detection stage, the input encrypted power consumption data is decrypted to get the output result of the first layer of neural network. Since is less than , -element equations cannot be solved, therefore the SO cannot obtain the original consumer’s power consumption data , while still complete the electricity theft detection. Therefore, the proposed scheme preserves the privacy of consumers.
Scenario 2: Consumers’ fine-grained power consumption data cannot be falsified and forged during transmission and storage, etc.
Proof: The messages sent by each in the scheme are BLS signed as to ensure the integrity of the data and prevent falsification. After accepting the message, the MN creates a block after establishing the Merkle tree, and each can access the block to verify whether its data has been falsified. Meanwhile, since all data transfers in the blockchain have timestamps and cannot be changed when added to the blockchain, so the proposed scheme can resist data falsification and forgery.
Scenario 3: The proposed scheme does not require a third party and also can resist collusion attacks by smart grid entities.
Proof: In the proposed scheme, the whole process does not require the participation of a third party, which makes the scheme more reliable and convenient. In the keys security aggregation process, each negotiates the masks with all other SMs, and the mask agreement is based on the computational Diffie-Hellman hard problem. Suppose the SO wants to get the private key of the after colluding with the MN, it still needs to collude with SMs, which is not achievable in practice. Therefore, our scheme resists collusion attacks.
VII-B Effective defense evaluation
In this sub-section, we calculate the probability of successful attacks by the attacker in two scenarios and illustrate the effectiveness of the scheme through mathematical proofs.
VII-B1 Scenario 1
Network attackers may destroy data before it is transmitted, during its transmission, and after it is received by the MN to render the system inoperable.
VII-B2 Scenario 2
Network attackers may tamper with the original data before it is transmitted, during data transmission, and after it is received by the MN (before it is broadcast) to allow false data to be verified.
The attack methods and success probabilities of data being destroyed and tampered with before, during, or after transmission are summarized in Table VII.
For scenario 1, we suppose that the probability of an attacker hacking into a smart meter is denoted as , , and the probability of an attacker hacking into a channel is denoted as , . To make the system unworkable, the attacker needs to attack smart meters with success probability before data transmission, channels with success probability during data transmission, and after the MN accepts the data, the success probability of the attack is . However, because is large, the attacker’s probability of hacking into the smart meters is extremely low, and even if it is destroyed during the data transmission phase, it can still be detected from the data signature to discover and eliminate this attack, and meanwhile, when the MN is attacked and the data is destroyed, all other SMs will find the wrong data in the consensus phase and re-vote to select a new MN, so our scheme has good defense capability under scenario 1.
For scenario 2, we suppose that the probability of an attacker stealing the private key of the smart meter is denoted as , . When the attacker wants to tamper with the data in the smart grid, the probability of a successful attack before data transmission is , during data transmission is , after the MN receives the data, the probability of a successful attack is . Compared with scenario 1, scenario 2 can be attacked with more demanding requirement conditions and a lower probability of a successful attack. From the above probabilistic analysis, it can be demonstrated that our scheme can perform the basic tasks in a more secure environment.
VII-C System characteristic comparison
The proposed scheme is compared with several other representative privacy-preserving electricity theft detection schemes for smart grid in terms of non-reliance on any trusted third party (TTP), data non-falsifiability (DNF), data non-repudiation (DNR), and data non-tamperability (DNT). As shown in Table VIII, the related work does not achieve all the desired characteristics of the smart grid, while only the proposed scheme achieves it.
VIII Conclusion
In this paper, we propose a more secure blockchain-based privacy-preserving electricity theft detection scheme. The proposed scheme does not require a third party, which avoids the security and privacy issues brought about by a third party. Meanwhile, the distributed storage scheme of blockchain prevents security issues such as data tampering and forgery. In addition, a real dataset and environment are used for simulation evaluation. The experimental results show that the proposed scheme can detect malicious consumers more accurately with acceptable communication and computational overhead. System analysis shows that the proposed scheme is more secure compared to existing schemes. For our future work, we intend to improve the proposed scheme by reducing communication and computation overhead.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] V. C. Gungor, D. Sahin, T. Kocak, S. Ergut, C. Buccella, C. Cecati, and G. P. Hancke, “A survey on smart grid potential applications and communication requirements,” IEEE Transactions on industrial informatics , vol. 9, no. 1, pp. 28–42, 2012.
- 2[2] Y. Wang, Q. Chen, T. Hong, and C. Kang, “Review of smart meter data analytics: Applications, methodologies, and challenges,” IEEE Transactions on Smart Grid , vol. 10, no. 3, pp. 3125–3148, 2018.
- 3[3] Z. Zeng, X. Wang, Y. Liu, and L. Chang, “Msda: multi-subset data aggregation scheme without trusted third party,” Frontiers of Computer Science , vol. 16, no. 1, pp. 1–7, 2022.
- 4[4] X. Xia, Y. Xiao, and W. Liang, “Sai: A suspicion assessment-based inspection algorithm to detect malicious users in smart grid,” IEEE Transactions on Information Forensics and Security , vol. 15, pp. 361–374, 2019.
- 5[5] P. Mc Daniel and S. Mc Laughlin, “Security and privacy challenges in the smart grid,” IEEE security & privacy , vol. 7, no. 3, pp. 75–77, 2009.
- 6[6] P. Gope and B. Sikdar, “Privacy-aware authenticated key agreement scheme for secure smart grid communication,” IEEE Transactions on Smart Grid , vol. 10, no. 4, pp. 3953–3962, 2018.
- 7[7] P. Jokar, N. Arianpoo, and V. C. Leung, “Electricity theft detection in ami using customers’ consumption patterns,” IEEE Transactions on Smart Grid , vol. 7, no. 1, pp. 216–226, 2015.
- 8[8] Z. Zheng, Y. Yang, X. Niu, H.-N. Dai, and Y. Zhou, “Wide and deep convolutional neural networks for electricity-theft detection to secure smart grids,” IEEE Transactions on Industrial Informatics , vol. 14, no. 4, pp. 1606–1615, 2017.
