
TL;DR
This paper refines the analysis of double spend attacks in Bitcoin, providing a precise probability formula, correcting previous assumptions, and offering a more detailed risk assessment based on confirmation counts and validation times.
Contribution
It corrects Nakamoto's original double spend analysis, introduces a closed-form probability formula, and offers an asymptotic approximation for better risk evaluation.
Findings
Probability of double spend success decreases exponentially with confirmations.
Larger confirmation counts are needed than previously estimated.
Conditional probability analysis improves risk assessment accuracy.
Abstract
We correct the double spend race analysis given in Nakamoto's foundational Bitcoin article and give a closed-form formula for the probability of success of a double spend attack using the Regularized Incomplete Beta Function. We give a proof of the exponential decay on the number of confirmations, often cited in the literature, and find an asymptotic formula. Larger number of confirmations are necessary compared to those given by Nakamoto. We also compute the probability conditional to the known validation time of the blocks. This provides a finer risk analysis than the classical one.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\date
October 26, 2016
Double spend races
Cyril Grunspan
Cyril Grunspan
Léonard de Vinci Pôle Univ, Research Center, Labex Réfi
92 916 Paris-La Défense, France
and
Ricardo Pérez-Marco
Ricardo Pérez-Marco
CNRS, IMJ-PRG, Labex Réfi , Labex MME-DDII
Bât. Sophie Germain, Case 7012, 75205-Paris Cedex 13, France
Author’s Bitcoin Beer Address (ABBA)111Send some bitcoins to support our research at the pub.:
1KrqVxqQFyUY9WuWcR5EHGVvhCS841LPLn
(Date: February 9th 2017)
Abstract.
We correct the double spend race analysis given in Nakamoto’s foundational Bitcoin article and give a closed-form formula for the probability of success of a double spend attack using the regularized incomplete beta function. We give a proof of the exponential decay on the number of confirmations, often cited in the literature, and find an asymptotic formula. Larger number of confirmations are necessary compared to those given by Nakamoto. We also compute the probability conditional to the known validation time of the blocks. This provides a finer risk analysis than the classical one.
Key words and phrases:
Bitcoin, blockchain, double spend, mining, proof-of-work, Regularized Incomplete Beta Function.
2010 Mathematics Subject Classification:
68M01, 60G40, 91A60, 33B20.
To the memory of our beloved teacher André Warusfel who taught us how to have fun with the applications of mathematics.
1. Introduction.
The main breakthrough in [Nakamoto 2008] is the solution to the double spend problem. Before this discovery no one knew how to avoid the double spending of an electronic currency unit without the supervision of a central authority. This made Bitcoin the first form of peer-to-peer (P2P) electronic currency.
A double spend attack can only be attempted with a substantial fraction of the hashrate used in the proof-of-work of the Bitcoin network. The attackers will start a double spend race against the rest of the network to replace the last blocks of the blockchain by secretly mining an alternate blockchain. The last section of the Bitcoin’s white paper [Nakamoto 2008] computes the probability that the attackers catch up. However Nakamoto’s analysis is not accurate since he makes the simplifying assumption that honest miners validate blocks at the expected rate. We present a correct analysis and give a closed-form formula for the exact probability.
Theorem 1**.**
Let , respectively , be the relative hash power of the group of the attackers, respectively of honest miners. After blocks have been validated by the honest miners, the probability of success of the attackers is
[TABLE]
where is the regularized incomplete beta function
[TABLE]
In general, for , these probabilities are larger than those obtained by Nakamoto. From the standpoint of bitcoin security, this shows than larger confirmation times are necessary compared to those given by Nakamoto, in particular this happens when the share of hashrate of the attackers is important. The following table shows the number of confirmations to wait compared to those given by Nakamoto for an attacking hashrate of (or ) and a probability of success of the attackers less than .
[TABLE]
Table 1. Comparison of number of confirmations.
Nakamoto claims in [Nakamoto 2008] that the probability converges exponentially to [math] with . This result is intuitively expected and cited at large but there is no proof available in the literature. We give here a rigorous proof of this result. More precisely we give precise asymptotics both for and showing the exponential decay.
Theorem 2**.**
When we have , with ,
[TABLE]
and with , and ,
[TABLE]
We can check that which means that for large.
A finer risk analysis.
We analyze a new parameter in the risk of a double spend. The probability of success of the attackers increases with the time it takes to validate the transactions since they have more time to secretly mine their alternate blockchain. On the other hand the task of the attackers is more difficult if the validations happen faster than the expected time. The value of is known, therefore what is really relevant is the conditional probability assuming is known. We introduce the dimensionless parameter which measures the deviation from average time:
[TABLE]
where is the average time of block validation by honest miners (, where for the Bitcoin network).
Figure 1. Probability of success as a function of with
We study the probability of success of the attackers. We can recover the previous probabilities with the , .
Theorem 3**.**
We have
[TABLE]
and
[TABLE]
with the density function
[TABLE]
We give a closed-form formula for .
Theorem 4**.**
We have
[TABLE]
Here denotes the incomplete gamma function
[TABLE]
where
[TABLE]
We find also the asymptotics for for different values of .
Theorem 5**.**
The following hold for ,
- (1)
For ,
[TABLE] 2. (2)
For ,
[TABLE] 3. (3)
For ,
[TABLE] 4. (4)
For , and
[TABLE] 5. (5)
For , and
[TABLE]
Using a concavity argument we show that , but in general, for , we have . We do compute an explicit, non sharp, value for which this inequality holds:
Theorem 6**.**
Let . A sufficient condition for having is with being the smallest integer greater or equal to
[TABLE]
where .
We also provide a double entry tables of the for different values of for and . For a complete set of tables for of practical use we refer to the companion article [Grunspan & Pérez-Marco 2017].
2. Mathematics of mining.
We review some basic results in probability (see [Feller 1971] vol.2, p.8) and of Bitcoin mining (see [Pérez-Marco 2016] for an overview of Bitcoin protocol).
A hashing algorithm digest any file into a fixed length string of bits. The slightiest modification of the original file produces a completely different output. The bits of the output appear with a random frequency and it is computationnally hard to find collisions (different inputs yielding the same output). Hashing algorithms are used for example to check the integrity and non-tampering of files.
The two main hashing algorithms used in the Bitcoin protocol are RIPEMD-160 and SHA-256 that produce outputs of and bits respectively. The mining algorithm consists in performing the double SHA-256 of the block header (doubled to prevent “padding attacks”).
The consensus protocol and security in the Bitcoin network relies on the process of bitcoin mining and validating transactions. It consists on the iteration of computation of block header hashes changing a nonce 222More precisely, a double hash SHA256(SHA256(header)) is computed, changing a nonce and an extra-nonce. in order to find a hash below a predefined threshold, the difficulty [Nakamoto 2008]. For each new hash the work is started from scratch, therefore the random variable measuring the time it takes to mine a block is memoryless, which means that for any
[TABLE]
Therefore we have
[TABLE]
This equation and a continuity argument determines the exponential function and implies that is an exponentially distributed random variable:
[TABLE]
for some parameter , the mining speed, with .
If is a sequence of independent identically distributed exponential random variables (for example is the mining time of the -th block), then the sum
[TABLE]
is a random variable following a gamma density with parameters (obtained by convolution of the exponential density):
[TABLE]
and cumulative distribution
[TABLE]
We define the random process as the number of mined blocks at time . Setting , we have
[TABLE]
Since is equivalent to and we get
[TABLE]
which means that has a Poisson distribution with expectation .
3. Mining race.
We consider the situation described in section 11 of [Nakamoto 2008] where a group of attacker miners attempts a double spend attack. The attacker group has a fraction of the total hash rate, and the rest, the honest miners, has a fraction . Thus the probability that the attackers find the next block is while the probability for the honest miners is . Nakamoto computes the probability for the attackers to catch up when blocks have been mined by the honest group. In general to replace the chain mined by the honest miners and succeed a double spend the attackers need to mine blocks, i.e. to mine a longer chain. In the analysis it is assumed that we are not near an update of the difficulty which remains constant 333The difficulty is adjusted every 2016 blocks..
The first discussion in section 11 of [Nakamoto 2008] is about computing the probability of the attacker catching up when they lag by blocks behind the honest miners. The analysis is correct and is similar to the Gamblers Ruin problem. We review this.
Lemma 3.1**.**
Let be the probability of the event , “catching up from blocks behind”. We have
[TABLE]
Proof.
Note that after one more block has been mined, we have for ,
[TABLE]
and the only solution to this recurrence with and is (see [Feller 1971]).
∎
We consider the random variables and , resp. and , associated to the group of honest, resp. attacker, miners. And also consider the random Poisson process , resp. . The random variables and are clearly independent and have exponential distributions with parameters and . We have
[TABLE]
so
[TABLE]
Moreover, is an exponentially distributed random variable with parameters which represents the mining speed of the entire network, honest and attacker miners together. The Bitcoin protocol is calibrated such that with . So we have
[TABLE]
These results can also be obtained in the following way. The hash function used in bitcoin block validation is . The hashrate is the number of hashes per second performed by the miners. At a stable hashrate regime, the average time it takes to validate a block by the network is . If the difficulty is set to be , we validate a block when , where is the block header. The pseudo-random output of SHA256 shows that we need to compute an average number of hashes to find a solution. Let , resp. , be the hashrates of the honest miners, resp. the attackers. The total hashrate of the network is , and we have
[TABLE]
Let , resp. , be the average time it takes to validate a block by the honest miners, resp. the attackers. We have
[TABLE]
and from this we get that is half the harmonic mean of and ,
[TABLE]
and also
[TABLE]
Going back to the Poisson distribution parameters, we have
[TABLE]
and we recover the relations
[TABLE]
4. Nakamoto’s analysis.
Once the honest miners mine the -th block, the attackers have mined blocks with a probability computed in the next section (Proposition 5.1). If , then the attackers chain is adopted and the attack succeeds. Otherwise the probability they catch up is as computed above, therefore the probability of success of the attack is
[TABLE]
Then Nakamoto makes the simplifying assumption that the blocks have been mined according to average expected time per block. This is asymptotically true when but false otherwise. More precisely, he approximates by where
[TABLE]
As we have seen above, the random variable follows a Poisson distribution with parameter
[TABLE]
The final calculus in [Nakamoto 2008] is then
[TABLE]
However, this analysis is not correct since .
5. The correct analysis.
Let be the number of blocks mined by the attackers when the honest miners have just mined the -th block. We compute the distribution for .
Proposition 5.1**.**
The random variable has a negative binomial distribution with parameters , i.e. for ,
[TABLE]
Proof.
Let . We have that and are independent, therefore
[TABLE]
∎
Thus, contradicting to Nakamoto’s claim, we have proved that the distribution of is not a Poisson law with parameter . Rosenfeld [Rosenfeld 2014] noticed the inacuracy of Nakamoto’s approximation and proposed empirically the negative binomial distribution as a better approximation, not realizing that this was the exact distribution444From [Rosenfeld 2014]. “We will not use this assumption (Nakamoto’s one), but rather model more accurately as a negative binomial variable”, and from [Rosenfeld 2012] “Instead of a Poisson distribution, I used a more accurate negative binomial distribution.”. Only asymptotically we have convergence to the Poisson distribution:
Proposition 5.2**.**
In the limit , , and we have:
[TABLE]
Proof.
We have
[TABLE]
and the result follows using . ∎
We can now compute the probability of success of the attackers catching up a longer chain. This computation was previously done in [Rosenfeld 2014].
Proposition 5.3**.**
(Probability of success of the attackers) The probability of success by the attackers after blocks have been mined by the honest miners is
[TABLE]
Proof.
As explained before, we have
[TABLE]
∎
Figure 2. Nakamoto’s and real probability
Numerical application.
Converting to R code, given and , this simple function computes our probability :
prob<-function(z,q){
p=1-q;
sum=1;
for (k in 0:(z-1)) {sum=sum-(p^zq^k-q^zp^k)*choose(k+z-1,k)} ;
return(sum)
}
We can compare with the probability computed in [Nakamoto 2008].
[math]
Table 2. Probabilities for .
[math]
Table 3. Probabilities for .
[TABLE]
Table 4. Solving for less than 0.1%.
Therefore the correct results for bitcoin security are worse than those given in [Nakamoto 2008]. The explanation is that Nakamoto’s result is correct only if the mining time by the honest miners is exactly the expected time. Times longer than average help the attackers.
6. Closed-form formula.
We give a closed-form formula for using the regularized incomplete beta function (see [Abramovitch & Stegun 1970] (6.6.2)).
Theorem 6.1**.**
We have, with ,
[TABLE]
We recall that the incomplete beta function is defined (see [Abramovitch & Stegun 1970] (6.6.1)), for and , by
[TABLE]
and the classical beta function is defined (see [Abramovitch & Stegun 1970] (6.2.1)) by .
The Regularized Incomplete Beta Function is defined (see [Abramovitch & Stegun 1970] (6.6.2) and (26.5.1)) by
[TABLE]
Proof.
The cumulative distribution of a random variable with negative binomial distribution, with and as usual (see [Abramovitch & Stegun 1970] (26.5.26))) is given by
[TABLE]
This results from the formula (see [Abramovitch & Stegun 1970] (6.6.1))
[TABLE]
that we prove by integrating by parts the definition of .
Thus we get
[TABLE]
Making the change of variables in the integral definition, we also have a symmetry relation (see [Abramovitch & Stegun 1970] (6.6.3))
[TABLE]
Therefore we have , and . The result follows using (see [Abramovitch & Stegun 1970] (26.5.14)), , where .
∎
7. Asymptotic and exponential decay.
Nakamoto makes the observation ([Nakamoto 2008] p.8), without proof, that the probability decreases exponentially to [math] when . We prove this fact for the true probability using the closed-form formula from Proposition 6.1,
Proposition 7.1**.**
When we have, with ,
[TABLE]
By integration by parts we get the following elementary version of Watson’s Lemma:
Lemma 7.2**.**
Let with and absolutely convergent integral
[TABLE]
then, when , we have
[TABLE]
Then we get the following asymptotics (see also [López & Sesma 1999]):
Lemma 7.3**.**
For , we have when ,
[TABLE]
Proof.
Making the change of variable in the definition
[TABLE]
we get
[TABLE]
and the result follows applying Lemma 7.2 with . ∎
Now we end the proof of Proposition 7.1. By Stirling asymptotics,
[TABLE]
so
[TABLE]
8. A finer risk analysis.
In practice, in order to avoid a double spend attack, the recipient of the bitcoin transaction waits for confirmations. But he also has the information on the time it took to confirm the transaction times. Obviously the probability of success of the attackers increases with . The relevant parameter is the relative deviation from the expected time
[TABLE]
Our purpose is to compute the probability of success of the attackers. Note that is the probability computed by Nakamoto [Nakamoto 2008],
[TABLE]
Computation of .
The attackers mined blocks during the time with probability that follows a Poisson distribution with parameter
[TABLE]
that means
[TABLE]
For we recover Nakamoto’s approximation.
The cumulative Poisson distribution can be computed with the incomplete regularized gamma function ([Abramovitch & Stegun 1970] (26.4))
[TABLE]
where
[TABLE]
is the incomplete gamma function and is the regular gamma function. We have
[TABLE]
We compute as before
[TABLE]
Figure 3. Probability of success as a function of
Thus we get a explicit closed-form formula for ,
Theorem 7**.**
We have
[TABLE]
and
[TABLE]
9. Asymptotics of and .
We find the asymptotics of when for different values of .
Lemma 9.1**.**
We have
- (1)
For , and . 2. (2)
For , and . 3. (3)
For , .
Proof.
(1) By [Digital Library of Mathematical Functions] (8.11.6) and Stirling formula,for we have
[TABLE]
(2) Also by [Digital Library of Mathematical Functions] (8.11.12) and Stirling formula,
[TABLE]
and
[TABLE]
(3) By [Digital Library of Mathematical Functions] (8.11.7) and Stirling formula,for we have
[TABLE]
∎
For we define , which is positive since the graph of is the tangent at to the concave graph of the logarithm function. We denote .
We have that the Nakamoto probability also decreases exponentially with as claimed by Nakamoto in [Nakamoto 2008] without proof.
Proposition 9.2**.**
We have for ,
[TABLE]
Proof.
The result follows from the closed-form formula from Theorem 7,
[TABLE]
and then from points (1) and (2) of Lemma 9.1,
[TABLE]
and
[TABLE]
∎
More generally, we have five different regimes for the asymptotics of for , , , and .
Proposition 9.3**.**
We have for ,
- (1)
For ,
[TABLE] 2. (2)
For ,
[TABLE] 3. (3)
For ,
[TABLE] 4. (4)
For , and
[TABLE] 5. (5)
For , and
[TABLE]
Proof.
(1) If then also , and
[TABLE]
and
[TABLE]
and then
[TABLE]
Since we have,
[TABLE]
(2) This was proved in Proposition 9.2.
(3) When then by Lemma 9.1,
[TABLE]
and
[TABLE]
So we have
[TABLE]
(4) The previous asymptotic at the start of the proof of (3) is also valid for and gives
[TABLE]
and by Lemma 9.1,
[TABLE]
(5) For we use again the same asymptotic of (3) to get
[TABLE]
and again
[TABLE]
so
[TABLE]
∎
10. Comparing asymptotics of and .
We have an asymptotic comparison,
Proposition 10.1**.**
We have for ,
[TABLE]
Proof.
Note that
[TABLE]
So with we have
[TABLE]
and for large
[TABLE]
∎
As we will see later we can be more explicit about the inequality between and .
11. Recovering from .
We have seen above that can be recover from by taking the value at . It turns out that we can also recover as a weighted average on of .
Theorem 8**.**
We have
[TABLE]
with the density function
[TABLE]
We check that
[TABLE]
We can write
[TABLE]
where
[TABLE]
Then the Theorem follows from a direct computation,
Lemma 11.1**.**
For , we have
[TABLE]
We give a second more conceptual proof.
Proof.
Consider the random variable
[TABLE]
We have seen above that so . So the density is the distribution of . It is enough to prove that
[TABLE]
We have
[TABLE]
And by conditioning by we get
[TABLE]
since , , and
[TABLE]
∎
We also note that .
12. Range of .
The probability to observe a deviation greater than is with . We have that follows a -distribution, , so
[TABLE]
Then, by Lemma 9.1, for . Note that this probability does not depend on . For , we have and for , . So, in practice, the probability to have is very unlikely. Below, we have represented the graph of for different values of () and .
Figure 4. Probability as a function of
We see that is convex in the range of values of considered. We study the convexity in more detail in the next section.
13. Comparing and .
Now we study the convexity of . Recall that . From Theorem 7 we have
[TABLE]
Since
[TABLE]
we get, after some cancellations,
[TABLE]
We observe that , so is an increasing function of as expected. For the second derivative we have
[TABLE]
Therefore we study the sign of
[TABLE]
For we have
[TABLE]
therefore is a concave function and by Jensen’s inequality
[TABLE]
Corollary 13.1**.**
We have (for all )
[TABLE]
In general, for , we have the reverse inequality. To determine the sign of we study its zeros.
The equation to solve is
[TABLE]
This is a polynomial equation in , the coefficients are increasing on , and the left hand side is decreasing on from to [math], therefore there is a unique solution , and
[TABLE]
We compute
[TABLE]
In this case the function is convex only in the interval . For large, most of the support of the measure is contained in this interval and we have by Jensen’s inequality
[TABLE]
where
[TABLE]
We can get some estimates on for . The first observation is that for large we have . The asymptotic limits for for and (Lemma 9.1) and Stirling asymptotic formula give that
[TABLE]
and .
For , we can use the asymptotic [Digital Library of Mathematical Functions] (8.11.7), ,
[TABLE]
and
[TABLE]
thus, since
[TABLE]
we have
[TABLE]
Now, if
[TABLE]
we have , so we get:
Proposition 13.2**.**
[TABLE]
Using the second order asymptotic ([Digital Library of Mathematical Functions] (8.11.7)), for , ,
[TABLE]
so
[TABLE]
Writing
[TABLE]
and using
[TABLE]
we get
Proposition 13.3**.**
For
[TABLE]
Also we have
[TABLE]
for
[TABLE]
so, for of the order of we have .
14. Bounds for
Remember that we have set . We have the following inequality that is a particular case of more general Gautschi’s inequalities [Gautschi 1959]:
Lemma 14.1**.**
Let . We have
[TABLE]
Proof.
By Cauchy-Schwarz inequality, we have:
[TABLE]
On the other side, the last inequality with replaced by gives:
[TABLE]
∎
Lemma 14.2**.**
For , we have
[TABLE]
Proof.
The function is non-decreasing. So, by definition of and the upper bound of the inequality of Lemma 14.1, we have
[TABLE]
In the same way, using the lower bound of the inequality of Lemma 14.1, we have
[TABLE]
∎
Note that this gives again the exponential decrease of Nakamoto’s probability.
15. An upper bound for
Proposition 15.1**.**
We have,
[TABLE]
This upper bound is quite sharp in view of the asymptotics in Proposition 9.3 (2).
Lemma 15.2**.**
Let and .
- (1)
If , then 2. (2)
**
Proof.
For (1) We use [Digital Library of Mathematical Functions] (8.7.1)
[TABLE]
which is valid for . Let . Using , we get:
[TABLE]
On the other hand, by [Digital Library of Mathematical Functions] (5.6.1), we have
[TABLE]
and for any ,
[TABLE]
For (2) this comes directly from [Digital Library of Mathematical Functions] (8.10.13). ∎
Recalling that , we get Proposition 15.1.
16. Comparing again and .
The aim of this section is to compute an explicit rank (no sharp) for which for .
Lemma 16.1**.**
Let . For all , .
Proof.
Let . We have , and . So, and . Therefore, for . ∎
Lemma 16.2**.**
For and we have .
Proof.
The inequality is trivial when . So, we can assume that . For , we have . For , by Lemma 16.1, we have for . For , the largest root of the polynomial is which is smaller than since for . So, the inequality results from Lemma 16.1 again. ∎
Lemma 16.3**.**
For , if
[TABLE]
then we have
[TABLE]
Proof.
We have
[TABLE]
By Lemma 16.2, the last inequality is satisfied as soon as
[TABLE]
Moreover, we have
[TABLE]
∎
Theorem 16.4**.**
Let . A sufficient condition for having is with being the smallest integer greater or equal to
[TABLE]
where .
Proof.
First, note that
[TABLE]
So, and is well defined. Let . By Lemma 14.2 and Corollary 15.1 it is enough to prove that
[TABLE]
We have , thus . So, the inequality is satisfied as soon as and the result follows from Lemma 16.3. ∎
[TABLE]
Table 5. Sharp values
17. Tables for .
For complete Satoshi Tables see [Grunspan & Pérez-Marco 2017].
[TABLE]
Table 6. () for different values of and in .
[TABLE]
Table 7. () for different values of and in .
Acknowledgements: We are grateful to N. Emerson for his comments and remarks.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[Abramovitch & Stegun 1970] Abramovitch, M.; Stegun, I.A.; Hanbook of Mathematical Functions , Dover, NY, 1970.
- 2[Digital Library of Mathematical Functions] Digital Library of Mathematical Functions , http://dlmf.nist.gov
- 3[Feller 1971] Feller, W.; An introduction to probability theory and its applications , Wiley, 1971.
- 4[Gautschi 1959] Gautschi, W.; Some elementary inequalities relating to the gamma and incomplete gamma function , J. Math. and Phys., 38 , p.77-81, 1959.
- 5[Grunspan & Pérez-Marco 2017] Grunspan, C.; Pérez-Marco, R.; Satoshi Risk Tables , ar Xiv:1702.04421, 2017.
- 6[López & Sesma 1999] López, J.L.; Sesma, J.; Asymptotic expansion of the incomplete beta function for large values of the first parameter , Integral Transforms and Special Functions, 8 , 3-4, p.233-236, 1999.
- 7[Nakamoto 2008] Nakamoto, S.; Bitcoin: A Peer-to-Peer Electronic Cash System , www.bitcoin.org/ bitcoin.pdf, 2008.
- 8[Pérez-Marco 2016] Pérez-Marco, R.; Bitcoin and decentralized trust protocols , Newsletter European Math. Soc., 100 , p.32-38, 2016.
