Auditable Blockchain Randomization Tool
Olivia Saa, Julio Michael Stern

TL;DR
This paper introduces a cryptographically secure, auditable, and collusion-resistant blockchain-based randomization protocol suitable for statistical trials and legal procedures, ensuring transparency and integrity.
Contribution
It presents a novel, mathematically formalized randomization protocol that combines security, efficiency, and auditability using blockchain technology.
Findings
Protocol is statistically sound and cryptographically secure
Ensures traceability and auditability of randomization
Resistant to collusion and manipulation
Abstract
Randomization is an integral part of well-designed statistical trials, and is also a required procedure in legal systems, see Marcondes et al. (2019) This paper presents an easy to implement randomization protocol that assures, in a formal mathematical setting, a statistically sound, computationally efficient, cryptographically secure, traceable and auditable randomization procedure that is also resistant to collusion and manipulation by participating agents.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Auditable Blockchain Randomization Tool
Olivia Saa111IME-USP – Institute of Mathematics and Statistics of the University of São Paulo, Rua do Matão 1010, 05508-090, São Paulo, Brazil. e-mails: [email protected] and [email protected] Julio Michael Stern222 IME-USP – Institute of Mathematics and Statistics of the University of São Paulo. Rua do Matão 1010, 05508-090, São Paulo, Brazil. e-mails: [email protected] and [email protected]
Abstract
Randomization is an integral part of well-designed statistical trials, and is also a required procedure in legal systems, see Marcondes et al. (2019). This paper presents an easy to implement randomization protocol that assures, in a formal mathematical setting, a statistically sound, computationally efficient, cryptographycally secure, traceable and auditable randomization procedure that is also resistant to collusion and manipulation by participating agents.
Meos tam suspicione quam crimine iudico carere oportere.
My people should be free from either crime or suspicion.
Julius Caesar (62BC), in Suetonius (119CE, Sec.I.74.2).
Randomization: Bad and Good Practices
Randomization is a technique used in the design of statistical experiments: in a clinical trial, for example, patients are randomly assigned to distinct groups receiving different treatments with the goal of studding and contrasting their effects. Randomization is nowadays considered a golden standard in statistical practice; its motivation is to prevent systematic biases (like an unfair or tendentious assignment process) that could distort (unintentionally or purposely) the conclusions of the study. For further comments on randomization see Pearl (2000, 2004) and Stern (2008), for Bayesian perspectives see Basu (1988) and Gelman et al. (2003). In the legal context, randomization (also known as sortition or allotment) is routinely used for the selection of jurors or judges assigned to a given judicial case; see Marcondes et al. (2019).
Rerandomization is the practice of rejecting and discarding (for whatever reason) a given randomized outcome, that is subsequently replaced by a new randomization. Repeated rerandomization can be used to completely circumvent the haphazard, unpredictable or aimless nature of randomization, allowing a premeditated selection of a final outcome of choice. There are advanced statistical techniques capable of blending the best characteristics of random and intentional sampling, see for example Fossaluza (2015), Lauretto et al. (2012, 2017), and Morgan and Rubin (2012, 2015). Nevertheless, rerandomization is often naively used, or abused, with the excuse of (subjectively) “avoiding outcomes that do not look random enough”, see for example Bruhn and McKenzie (2009) and Ruxton and Colegrave (2006, Sec.3.4.6). In the legal context, spurious manipulations of the randomization process are often linked to fraud, corruption and similar maladies, see Marcondes et al. (2019) and references therein.
In order to comply with the best practices for randomization processes, Marcondes et al. (2019) recommends the use of computer software having a long list of characteristics, for example, being efficient and fully auditable, well-defined and understandable, sound and flexible, secure and transparent. Such requirements are expressed by the following (revised) desiderata for randomization procedures:
*Given the juridical and social importance of the themata under scrutiny, we believe that it is important to develop randomization procedures in full compliance with the following desiderata: (a) Statistical soundness and computational efficiency, see Hammersley and Handscomb (1964, ch.3), Haramoto et al. (2008), Knuth (1997, ch.3), and Ripley (1987, ch.2); (b) Procedural, cryptographical and computational security, see Boyar (1989), L’Ecuyer (2012), Aumasson (2017) and Katz and Lindell (2014); (c) Complete auditability and traceability, see Haber and Stornetta (1991), Nakamoto (2008); and Wattenhofer (2017); (d) Any attempt by participating parties or coalitions to spuriously influence the procedure should be either unsuccessful or be detected, see Goldschlag and Stubblebine (1998); (e) Open-source programming; (f) Multiple hardware platform and operating system implementation; (g) User friendliness and transparency, see Parikh and Pauly (2012) and Stern (2018); (h) Flexibility and adaptability for the needs and requirements of multiple application areas (like, for example, clinical trials, selection of jury or judges in legal proceedings, and draft lotteries), see Marcondes et al. (2019). *
Such requirements conflate several complementary characteristics that may seem, at first glance, incompatible. For example, strong security is often (but wrongly) associated with excessive secrecy, a doctrine known as “security by obscurity”, computer routines may be efficient but are often tough as hard to audit, and mathematically well-defined algorithms may be perceived as hard to understand. The bibliographical references given in the formerly stated desiderata for randomization procedures already hint at technologies that can be used to achieve a fully compliant randomization procedure, most preeminently, the blockchain. This is the key technology supporting modern public ledgers, cryptocurencies, and a host of related applications.
A technical challenge for the application under scrutiny is the generation of pseudo-random number sequences that reconcile complementary properties related to computational efficiency, statistical soundness, and cryptographic security. In this respect, the excellent statistical and computational characteristics of modern linear recurrence pseudo-random number generators, like Haramoto et al. (2008), can be reconciled with the needs concerning unpredictability and cryptographic security by appropriate starts and restarts of the linear recurrence generator. A sequence start for a linear recurrence generator is defined by a seed specified by a vector of (typically 1 to 64) integers, while a restart is defined by a jump-ahead or skip-ahead specified by a single integer (kept small relative to the generator’s full period), see L’Ecuyer (2012).
Unpredictable and cryptographically secure seeds and jump-aheads can be provided by high entropy bit streams extracted from blockchain transactions, an idea that has already been explored in the works of Bonneau et al. (2015) and Popov (2017).
The next section develops a possible implementation of a fully compliant core randomization protocol based on blockchain technology, and also makes a simple prototype available for study and further research333www link to be added at publication time. Moreover, in order to make it simple and easy to use, we develop the prototype on top of a readily available crypto-currency platform. We use Bitcoin for this example, but other alternatives like Ethereum or other cryptocurrencies whose miners work under the same incentives model can be used with minor adaptations.
Core Randomization Protocol in Blockchain
We intend to establish a protocol able to deliver on demand pseudo random numbers, from a auditable and immutable ledger. The procedure will start as follows: the user (the part that wants to receive a random number) shall send a Bitcoin transaction with a register of its purpose embedded444One way to embed a message in a transaction is using the OPRETURN script, which allows to store up to 40 bytes in a transaction in it. The recipient of this transaction may be a proxy representing a competent authority, a pertinent regulatory agency, an agreed custodian, etc. When this555If someone tries to generate more than one transaction for a same purpose, just take the one that was attached first. transaction is first attached to the blockchain, we concatenate the transaction ID (a 32 bytes, hexadecimal number) and the block header (a 80 bytes, hexadecimal number). This resulting 112 bytes hexadecimal number will be the input for some known Verifiable Delay Function (VDF), that should be calibrated accordingly to the purpose of the random number. For instance, a less critical purpose should have a VDF that delays the result in just a few seconds, or even skip completely the VDF step. A critical purpose, with significant interests involved, should have a more complex VDF, with a delay of minutes or even hours. The final result, after the VDF, will be the source for our seeds and jump-aheads.
With the aid of this protocol, one is able to find a different pseudo-random number for each user that demands it. Note that the user does not have any incentive to try to modify its transaction ID, because he does not have any control of the block header. We assume that the user and the miner are not the same person, so a miner will only be interested in trying to control his block header if he is paid to do so. Since the last stage of our protocol involves the calculation of a VDF, it will take a certain amount of time to the miner to decide if the the block he has found will be of interest of the user. Thus, he might even lose his block, if some other miner broadcasts a block of his own before he finishes calculating the VDF.
In the following subsection, the miner’s payoff and the necessary delay for the Verifiable Delay Functions will be explicitly calculated.
Preventing Collusion for Spurious Manipulation
Suppose a malicious user tries to bribe a miner that controls a fraction of the network’s computational power. A prize , where is the Bitcoin block reward, will be paid to the miner if he successfully mines what we call a "desirable block": a block that will deliver a random number in a set , chosen by the malicious user. Let also be the average rate of incoming blocks and the probability of a randomly generated number being an element of , i.e., the measure of the set of desirable results for the malicious user. Finally, let be the expected amount of time needed for the VDF calculations. The moment a miner finds a block that can be accepted by the network, he faces the decision of broadcasting it before checking the VDF, or calculating the VDF before broadcasting. If he decides to check the VDF before broadcasting, he might start another attempt to find a block rightaway.
First, we calculate the expected absolute payoff for the first and second options, called and , respectively. will be larger than , since the miner might issue a desirable block by chance:
[TABLE]
On the other hand, if the miner chooses to calculate the VDF, he will receive the block reward and the prize , but with a probability given by
[TABLE]
The probabilities inside the summation, in the last equation, can be calculated as the product of the probability of finding a desirable block after attempts (that will be a geometric distribution with probability of success ) and the probability of finding and checking blocks before the rest of the network mines one 666
, resulting:
[TABLE]
Finally, in order to make accepting the bribe not lucrative, we must have , i.e.:
[TABLE]
Since for every we have , if we choose , we guarantee that the attack will not be lucrative for any bribe . Also, since it can be assumed that , a value will be high enough to prevent an attack for any bribe and any acceptable value of .
Conclusions and Final Remarks
We formalized a simple and effective protocol to generate on demand pseudo random numbers, in a fully auditable way. We have demonstrated that none of the involved parts has enough financial incentives to try to affect the random number outcome: the part that issues the transaction lacks this power, since it does not have any control on the block header; and the miners do not have enough financial incentives to collude with an attacker, provided a suitable Verifiable Delay Function is applied.
The essentially decentralized, yet completely traceable and auditable nature of the protocol presented in this article, makes the resulting randomization process eminently reliable without recourse of blind trust in any central authority. The authors believe the adoption of such a protocol by the the Brazilian Supreme Court (STF), as recommended in Marcondes et al. (2019), would significantly increase public confidence in the judicial system and be a contributing factor for political and social stability.
Acknowledgments
The authors are grateful for the support received from IME-USP – the Institute of Mathematics and Statistics of the University of São Paulo and for the advice of Prof. Serguei Popov. The authors also received support from FAPESP – the State of São Paulo Research Foundation (grants CEPID-CeMEAI 2013/07375-0 and CEPID-Shell-RCGI 2014/50279-4); CNPq – the Brazilian National Counsel of Technological and Scientific Development (grant PQ 301206/2011-2 and GD 140490/2016-7); ABJ – the Brazilian Jurimetrics Association; STF – Supremo Tribunal Federal (the Brazilian Supreme Court), which motivated the study and provided the data analysed in Marcondes et al. (2019); the IOTA Foundation; and received helpful comments and advice from Adilson Simonis, Álvaro Machado Dias, Julio Trecenti, Rafael Bassi Stern and Marcelo Guedes Nunes.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Aumasson (2017) Aumasson, Jean-Philippe (2017). Serious Cryptography: A Practical Introduction to Modern Encryption. No Starch Press.
- 2Basu (1988) Basu, D.; Ghosh, J.K. (ed.) (1988), Statistical Information and Likelihood, A Collection of Essays by Dr.Debabrata Basu, Lecture Notes in Statistics, 45, Springer.
- 3Boneh et al. (2018) Boneh, D.; Bonneau, J.; Bünz, B.; Fisch, B. Verifiable delay functions. Cryptology e Print Archive, Report 2018/601, 2018. https://eprint.iacr.org/2018/601
- 4Bonneau et al. (2015) Bonneau, Joseph, Jeremy Clark, and Steven Goldfeder. On Bitcoin as a public randomness source. IACR Cryptology e Print Archive 2015 (2015): 1015.
- 5Boyar (1989) Boyar, J. (1989). Inferring Sequences Produced by Pseudo-Random Number Generators. Journal of the ACM, 36, 1, 129-141.
- 6Bruhn and Mc Kenzie (2009) Bruhn, M.; Mc Kenzie, D. (2009). In Pursuit of Balance: Randomization in Practice in Development Field Experiments. American Economic Journal, Applied economics, 1, 4, 200-232.
- 7Fossaluza (2015) Fossaluza, V.; Lauretto, M.S.; Pereira, C.A.B.; Stern, J.M. (2015). Combining Optimization and Randomization Approaches for the Design of Clinical Trials. Springer Proceedings in Mathematics and Statistics, Vol. 118, Ch. 14, p. 173-184.
- 8Gelman et al. (2003) Gelman, A.; Carlin, J.B.; Stern, H.S.; Rubin, D.B. (2003). Bayesian Data Analysis, 2nd ed. NY: Chapman and Hall / CRC
