Types for Location and Data Security in Cloud Environments
Ivan Gazeau, Tom Chothia, Dominic Duggan

TL;DR
This paper introduces a typed programming language for cloud environments that enforces data access policies across devices, ensuring confidentiality even with untrusted or compromised devices, without requiring a centralized PKI.
Contribution
It presents a novel type system that guarantees data confidentiality and policy enforcement in distributed cloud systems without middleware or pre-registered identities.
Findings
Type system enforces data access restrictions across devices.
Confidentiality holds against any intruder under labeled bisimilarity.
Programming errors do not lead to data leakage in well-typed devices.
Abstract
Cloud service providers are often trusted to be genuine, the damage caused by being discovered to be attacking their own customers outweighs any benefits such attacks could reap. On the other hand, it is expected that some cloud service users may be actively malicious. In such an open system, each location may run code which has been developed independently of other locations (and which may be secret). In this paper, we present a typed language which ensures that the access restrictions put on data on a particular device will be observed by all other devices running typed code. Untyped, compromised devices can still interact with typed devices without being able to violate the policies, except in the case when a policy directly places trust in untyped locations. Importantly, our type system does not need a middleware layer or all users to register with a preexisting PKI, and it allows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Types for Location and Data Security
in Cloud Environments
Ivan Gazeau111LORIA, INRIA Nancy - Grand-Est,France
Tom Chothia222School of Computer Science, Univ. of Birmingham, UK
Dominic Duggan333Department of Computer Science, Stevens Institute of Technology, USA.
(Stevens Technical Report CS-2017-1)
Abstract
Cloud service providers are often trusted to be genuine, the damage caused by being discovered to be attacking their own customers outweighs any benefits such attacks could reap. On the other hand, it is expected that some cloud service users may be actively malicious. In such an open system, each location may run code which has been developed independently of other locations (and which may be secret). In this paper, we present a typed language which ensures that the access restrictions put on data on a particular device will be observed by all other devices running typed code. Untyped, compromised devices can still interact with typed devices without being able to violate the policies, except in the case when a policy directly places trust in untyped locations. Importantly, our type system does not need a middleware layer or all users to register with a preexisting PKI, and it allows for devices to dynamically create new identities. The confidentiality property guaranteed by the language is defined for any kind of intruder: we consider labeled bisimilarity i.e. an attacker cannot distinguish two scenarios that differ by the change of a protected value. This shows our main result that, for a device that runs well typed code and only places trust in other well typed devices, programming errors cannot cause a data leakage.
1 Introduction
Organisations commonly trust their cloud providers not to be actively malicious but may still need to verify that the cloud service does not make mistakes and does store their data at particular locations. For example, data protection laws prevent the storage of certain forms of data outside the European Union. In order to ensure compliance, a company policy may require that data from a user’s device not be synchronized with a cloud storage provider, unless that provider can certify that the data will not be stored in data centers outside the EU. Such checks against inappropriate storage of data can be costly and time consuming, sometimes leading to organisations not allowing their employees to use cloud services, even though a particular cloud service may be known to be compliant with data handling policies.
This paper presents a language-based approach to dealing with these kinds of scenarios, of ensuring that data can be shared with cloud services while ensuring compliance with policies on data storage. The approach is based on a type system that explicitly models trust between cloud services and mobile devices, based on a notion of principals represented at runtime by cryptographic keys. In this language, principals are dynamically associated to (non-disjoint) sets of devices, and the rights of devices to access data is based on sets of principals (delineating the principals that are allowed to access protected data). For instance, if two devices and can (individually) act for a principal , while devices and can act for principal , then the right implicitly allows the devices , and to access data guarded by this right. We argue that this two-layer representation of access rights (data guarded by principal rights, and devices acting for principals) is convenient to allow principals to share a device, and to allow one principal to use several devices (laptop, mobile phone etc.)
Based on this type system, we present a language that includes most of the primitives necessary for secure imperative programming (multi-threading, references, secure channel establishment, and cryptographic ciphers). A key feature of this language is the ability for new principals to dynamically join the network (in the sense of making network connections to cloud services and other devices) without having to register to any public key infrastructure (PKI) or to use a particular middleware layer444 Although middleware for e-commerce (such as CORBA) was popular in the 1990s, that approach was discredited by experience, while SOAP-based approaches have been largely superseded by REST-based Web services, that deliberately eschew the notion of a middleware at least for Internet communication.. .
Our threat model assumes some “honest” collection of principals (e.g. the employees of an enterprise) and some collection of devices acting for those principals (e.g. devices provided to those employees). A device may act for several principals, in the sense that it may issue access requests on behalf of any of the principals that it acts for), while a principal may be associated with several devices. We describe the devices acting for these principals as honest devices, in the sense that they are certified according to the type system presented in this paper to be in conformance with data sharing policies. We refer to the corresponding principals for these devices as “honest” rather than “trusted” because trust management is an orthogonal issue for the scenarios that we consider. Our threat model also allows for “dishonest” or untyped devices, acting for outside principals, who should not be able to access the data. These are the attacker devices. Our security guarantee is that, as long as no honest device provides access to a dishonest principal, the dishonest devices will not be able to obtain any information from any honest devices, unless an honest device has explicitly given the attacker access.
This capability is critical for cloud services. While it is reasonable to assume that there exists a PKI to certify the identities of cloud providers, and that cloud providers are trusted by their users, client devices and their corresponding principals are unlikely to have certificates. While an infrastructure for mutual authentication, based on client and server X509 certificates, might be provided as part of an enterprise data sharing network, there are difficult issues with extending this trust model to third party cloud service providers. Furthermore, such a PKI does not provide the level of confidence in conformance with data-sharing protocols, that our approach provides for honest (well-typed) devices. Instead of requiring such a global PKI, the approach described in this article represents principals by cryptographic public keys, and these are stored on devices like any other values. Our type system therefore uses a nominal form of dependent types, to reflect these runtime representatives of principal identity to the type system, where access rights on data are tracked through security labels attached to types.
Another contribution of the paper is a security guarantee suitable for our threat model. As stated above, in an open network, we must allow for “dishonest” devices that are unchecked (untyped), and potentially malicious. These devices are able to use any third-party cloud services, including services used by honest devices. A secure data-sharing system should remain robust to such an intruder. Our security guarantee is that if data is protected with rights that only includes honest principals (i.e., that do not include any principal which is associated to an attacker device), then an attacker cannot learn any information about that data. In this work, we are focused on confidentiality of the data, and do not consider integrity (that data has not been tampered with by attackers). There is a notion of integrity in the sense of trust management underlying our approach: Honest principals identify themselves by their public keys, and these keys are to state access restrictions on data, to specify when a device is allowed to “act for” a principal, and to validate that communication channels are with honest parties trusted to be type-checked and therefore conformant with data-sharing policies. This is reflected in several aspects of the communication API, including the establishment of communication channels, generating new principals and transmitting those principals between devices.
A practical difficulty with expressing security guarantees in this setting is that, as principals can be created dynamically, it is not possible to statically check if a variable has a higher or lower security level than some other variable. Consider the following example:
Example 1
Assume three devices: an honest cloud service with a certified principal , and two (mobile) devices, an honest device and a malicious device . The server code consists of receiving two principal values and from the network before it creates two memory locations and where has rights while has right . A secure location is defined one which cannot be accessed by the attacker. If we assume that devices and send their principal values (public keys) to the server, then depending on which key is received first, either or will be considered secure (only accessible by the honest principal and the cloud provider), while the other is explicitly accessible by the attacker. As usual with information flow control type systems, we propagate these access restrictions through the handling of data by the cloud provider and the devices, ensuring that data protected by the right , where is the representative for the honest principal.
Therefore, our security property is a posteriori: once the system has created a memory cell corresponding to some variable, if the rights associated to this variable at creation time did not include a principal controlled by the attacker, then there will never be any leak about the contents of this memory cell to the attacker. We argue that such a property is suitable for cloud services to increase users’ confidence that their data will not be leaked by the service due to a programming error. Indeed, our system allows us to certify that once some principal creates data, only explicitly authorised principals can obtain information about that data, by statically checking the code that processes that data. Verifying the identity of the principals allowed to access the data, and deciding where to place trust among the principals in a distributed system, is an important consideration. However it properly remains the responsibility of the application written in our language, and a concern which is independent of this type system. Our approach serves to guarantee proper handling of data among honest principals, once an appropriate trust management system has established who is honest.
Our typed language is intended to be a low level language, without high-level notions such as objects and closures. It includes consider references, multi-threading and a realistic application programming interface (API) for distributed communication. Communications can either be through a secure channel mechanism, implementable using a secure transport layer such as TLS for example, or through public connections, in which case any device can connect. The language includes primitives for asymmetric key encryption, since we represent by public keys. “Possessing” a secret key, in the sense that the private key of a public-private key is stored in its memory, allows a device to “act for” the principal that key represents. Our approach is similar in philosophy to the Simple Public Key Infrastructure (SPKI), where principals and public keys are considered as synonymous, rather than linking principals to a separate notion of public key representatives. However, we do not include notions such as delegation that are the central consideration of SPKI, since we explicitly avoid the consideration of trust management, leaving that to applications written using the API that we provide. This also differentiates our approach from frameworks such as JIF and Fabric, that include delegation of authority to principals based on an assumed trust management infrastructure.
Nevertheless, there is a notion at least tangentially relegated to delegation of trust in our framework: In order to allow a device to act for more than one principal, our semantics allows a principal to be created on one device and communicated to another device, where it is registered on the receiver device as one of the principals upon whose behalf that device can access data. For example, a client of a cloud service provider may generate a proxy principal representing that client on the cloud service, and then upload that principal to the cloud service in order to access data that the client is storing on the cloud service. This ability to share principals across devices is controlled by restrictions established when proxy principals are generated: Such a proxy (client) principal can only be registered on a device that acts for (cloud service) principals that are identified at the point of generation of the proxy.
The security analysis of the type system uses standard techniques from the applied-pi calculus [1]. This allows us to prove our correctness property as a non-interference property based on process equivalence, i.e., two systems differing by one value are indistinguishable by any party that is not allowed to access this value. The standard pi-calculus includes message-passing with structured values, but does not include an explicit notion of memory (although it can obviously be modeled using processes as references). Since our language combines message-passing communication and localized stateful memory, we use the stateful applied pi-calculus [2] as the starting point for our security analysis. This calculus does not explicitly model location (i.e., the distinction between two processes on the same device and two processes on two distinct devices). Since this distinction is critical for our security analysis, we add this notion in our calculus. Nevertheless the proof techniques that we employ are heavily based on those developed for the stateful applied pi-calculus.
The security analysis that we perform expresses that data are secure if keys received from other devices are not associated to an attacker. To formalise this conditional statement, we need more techniques than in a standard protocol where data are either secret or public, but their status does not depend on the execution. In our verification in subsection 5.3, we introduce an extended syntax that marks which keys, variables and channels are secure in the current trace. We then prove that when a new memory location is created with a secure key according to this marking, then the attacker cannot distinguish between two scenarios: one where the system reduces normally, and another one where the memory location is sometimes altered to another value. This is the basis for our noninterference property for the security guarantee provided by this approach.
In the next section we discuss related work. In Section 3 we present our language, type system and semantics. In Section 4 we present an extended example and in Section 5 present our result and outline the proof, then we conclude in Section 6.
Appendix A provides addtional type rules for the language considered in this paper. Appendix B considers proofs of correctness omitted from Section 5. Appendix C considers addtional operational semantics rules for the “open” semantics that enables us to reason about interactions with untyped attackers.
2 Related Work
Implicit flow
Implicit information flow properties involve the ability for an attacker to distinguish between two executions. Previous work that has provided type systems to control implicit information flow [19, 3] considered high and low data, and this could be extended to a bigger lattice but not to the creation of new principals, as the security of a variable is defined statically. Zheng and Myers presented an information flow type system that includes dynamic checks on rights [21] which can be used, for instance, when opening a file. The Jif Project [16] adds security types to a subset of Java, leading to a powerful and expressive language. Unlike our work, this other work does not address how to enforce principal identities and type information to be correctly communicated to other locations.
Security properties on distributed system
Work on type security for distributed systems can be distinguished according to the kind of security they aim to provide. Muller and Chong present a type system that includes a concept of place [15] and their type system ensures that covert channels between “places” cannot leak information. Vaughan et al. look at types that can be used to provide evidence based audit [11, 18]. Fournet et al. look at adding annotations with a security logic to enforce policies [10]. Liu and Myers [13] look at a type system which ensures referential integrity in a distributed setting. This work uses a fix lattice of policy levels, which does not change at runtime. The Fabric language [12] provides decentralised, type-enforced security guarantees using a powerful middleware layer for PKI, and Morgenstern et al. [14] extend Agda with security types. In contrast, our work allows programs to generate new principals at run-time and provides a security property that tracks implicit information flow, without requiring the support of a purpose built middleware layer or global PKI. Due to the fact that the attackers in our model can access services in the same way as honest principals, this security property is an adaptation of the bisimulation property which is a strong property introduced in the (s)pi-calculus by [1]. Bisimilation can be checked for processes by tools like Proverif [5] but these kind of tools do not scale up to large systems.
Managements of new principals
Bengtson et al. [4] present a security type system which allows creation of dynamic principals in presence of an untyped attacker. However, this type system provides only assertion-based security properties of cryptographic protocols. These are weaker than non-interference properties as they are expressed on one process instead of comparing two processes. [7] considers a framework in which principals can be created at run time (without a global PKI) they prove type soundness rather than a non-interference result. Finally, the DSTAR program [20] achieves these two goals but is focused on network information and relies on local systems to actually analyse implicit flow, which leads to a more coarse system.
Safety despite compromised principals
Past work [6] has looked at un-typed attackers in a security type system, however this work only considers a static number of principals, fixed at run time. Fournet, Gordan and Maffeis [9] develop a security type system for a version of the applied pi-calculus extended with locations and annotation. Their type system can enforce complex policies on these annotations, and they show that these policies hold as long as they do not depend on principals that have been compromised. Unlike our work they assume that the principals are all known to each other and there is a direct mapping from each location to a single principal that controls it. Our work allows principals to be dynamically created, shared between locations and for locations to control multiple principal identities. We argue that this model is a better fit to cloud systems in which users can dynamically create many identities and use them with many services.
3 Language: Semantics and Type System
3.1 Syntax
The syntax of our language is given in Figures 1, 2, 3 and 4. We let range over variable names , range over principal names , range over public key names and range over channel names . A system is a set of devices that run in parallel and that communicates through channels of .
The list records which channel names correspond to establish channels (globally bound). Channel also appear in connect and accept commands in the devices, and these are added to once the channel is opened. When there are no established channels, we omit the prefix. Note that to guarantee freshness of keys and nonce used in encryption, we might also provide global binders and in addition to . However this guarantee is straightforward to provide, using “freshness” predicates, so for readability reasons we omit explicit binders for generated keys and nonces.
A device consists of a memory and a command . Memories associate variable names with values, key names with keys and principal names with principals.
Given a nonce from some assumed set of key nonces, we define as the public private key pair generated from , where and are two constructors. A principal is a tuple which contains a key pair , together with a (possibly empty) set of public key values . When a device has in its memory, it is allowed to act for . Devices that can act as a principal , whose public key is one of the public keys in , are allowed to add to their memory.
Each variable in represents a reference to a value i.e. variables are mutable. At declaration time, a reference is associated with some rights , which cannot be revoked or changed. Making all variables mutable is convenient for our security analysis: it allows us to define non interference as the property that a parallel process that alters the value of a high-level variable cannot be detected by an attacker. If variables were not mutable we would have to consider a much more complex security property and proof. In addition, to avoid to consider scope of variables, we assume that a command never declares twice the same name for variables, channels, keys and principals.
Types (Figure 2) for these variables consists of a pure-type which indicates the base type for the value of the variable (e.g. integer, public key, cipher, etc.) and a label (or right) which indicates the principals who are allowed to access to the variable. A label can be either i.e. the variable is public or a set which contains public key names: and where .
Key names are declared by a command . This command copies the value of the reference into which represents a public key (not a reference to a public key). The type system (deref_T) ensures that in this command is an unrestricted public key: .
Channel types are declared when they are established, we have two kinds of channel: public and secure channels. Their types have syntax where is the type of values that are past over the channel and expresses which principals are allowed to know the existence of and when communication on this channel takes place.
3.2 Semantics
The semantics of the system is defined as a small-step semantics for commands and as a big-step semantics for expressions. Devices run concurrently with synchronized communication between them. Inside each device, all parallel threads run concurrently and communicate through the shared memory of the device (since memory is mutable). The main reduction rules are presented in Figure 5 for commands and in Figure 6 for main expressions. When a command that declares a new variable is reduced, the name is replaced by a fresh name that represents a pointer to the location in the memory where the value has been stored. The evaluation of expressions has the form : the evaluation of with memory returns the value . We note that this is different from some other calculi, in which variables are not references, and are replaced with a value when declared. Our correctness statement below depends on the use of references and, since we have a memory mechanism, we prefer to store the key names and principal names in memory and instead of applying a substitution, the names are evaluated when a command reduces (cf the (rights_RS) rules).
Principals are generated using the command, where are the keys to use to protect the principal (and therefore cannot ). The rule for this command (newPrin_S) generates a fresh key pair and stores the principal at a new location in the memory, where . To bootstrap the creation of these principals, they can be declared with ; such principal identities can only be used on a single device, they cannot be sent over channels. Additionally some devices may start off with the public keys of some trusted parties, i.e., the same assumption as TLS. This too lends itself well to cloud systems, in which web browsers come with a number of trusted certificates.
Communication between devices uses Java like channels: a channel is first established then it can be used to input and output values. The channel establishment is done by substituting the channel names in both devices by a unique fresh channel name added to (we assume that initial channel names and active channel names range over distinct sub-domains of to avoid collision). Note that channels do not name the sending and receiving device as these may be spoofed by an attacker, however, to get a more tuneable system, it would be a simple extension to add a port number which would restrict possible connections. For secure channels (open_priv_S), in a similar way to TLS with client certificates, both devices must provide the public key of who they want to connect to. They must also provide the principal (which includes a private key) to identify themselves to the other party. To set up a secure channel, the client and the server also have to ensure that they are considering the same rights for the channel. For that they have to exchange the value of their channel right and make sure that it corresponds to the distant right value . Indeed, even if type-checking is static inside a device, type-checking has to be dynamic between distinct devices since programs are type-checked on each device and not globally.
Example 2** (Principal set up)**
*Assume that a cloud service has a public key , which is known to Alice and Bob, and that Alice wants to share some private data with Bob using this cloud service. Alice can do this using the code shown in Figure 7. Alice starts by generating a key pair . As neither Alice nor Bob have certificates, Bob just sends his key publicly to Alice over a public channel. Alice receives it, and creates a new variable to be shared with Bob and the cloud service. She then opens a secure channel with the cloud that is typed to allow data of type , the right on this channel indicates that, while the data on the channel must be kept confidential, the knowledge that some value has been sent is not. This fragment of code does not authenticate Bob, this could be done using another protocol, or offline, but we will show that if the device that sent her this key, and the cloud server, both run well typed code, then she is guaranteed that the secret will only be shared by the device sending the key, the cloud server and herself. She knows that no leak can come from, for instance, bad code design on the cloud device. *
The expression, governed by the (enc_E) rule, encrypts the evaluation of for each of the public keys named in , i.e., anyone that has a single private key corresponding to any can decrypt it, the set of all is also included in the encryption. We use randomised encryption to avoid leakage that would occur otherwise when the same value is encrypted twice, and we model this by including a fresh nonce in the encryption. The command reduces successfully (dec_true_S) when evaluates to a ciphertext that can be opened by the secret key of and that the , which is packed into the encryption, is a subset of the evaluation of .
The expression reduces by encrypting the principal for each of the set of public keys representing the principals that can access it. It is the only way to produce a value which contains a secret key and therefore to send private keys through a channel. The command behaves as except that it deals with encrypted principals instead of encrypted values.
All other semantics rules are standard except that instead of returning run-time error (division by [math], illegal offset index etc.) expression returns a special value NaV. This feature is critical to guarantee the security of our system. Indeed, we allow a device to evaluate expressions with secure variables and then to do an output on a public channel. This scenario is safe only if we can ensure that no expression will block any thread. Note that the attacker can also send values with some type through a channel of another type, consequently run time type errors can also occur.
Finally, the command executes a command with no communication or interleaving of other processes. This is useful to avoid race conditions.
3.3 Types
The type judgment for expressions takes the form and the type judgment for commands takes the form where is a mapping from variable names to types , from channel names to channel type , is a set of principal names, is a set of public key names, and where is a right of form called the program-counter. The program counter allows to analyse programs for indirect secure information flow [8]. Figure 8 defines the main type rules for commands and Figure 9 defines the rules for expressions and rights.
In many typing judgments, we use a condition that states that is more confidential than . The predicate holds either when or when is a syntactical subset of (no aliasing). For instance, we have and even if and map to the same key in memory. We also use to define the syntactic intersection of the sets and (which is if ).
Types rules for new principals and variables
The typing rule for principal declaration (newPrin_T) only allows the program counter to be bottom. This restriction avoids the situation in which a variable with a right including this principal might be less confidential than the principal itself.
The new variable declaration (new_T) checks that the rights to access the new variable are more restrictive than (a subset of) the rights of the expression being assigned and of the program counter . We also ensure that one of the principal in belongs to . The type rule for assignment (assign_T) ensures that high security values cannot be assigned to lower security variables.
While the semantics for new principals (newPrin_S) stores the rights set dynamically (as is only used when the principal is sent to another device), the semantics for managing variables (new_S) does not consider them: their confidentiality is entirely provided by the type system.
Example 3
We consider the following piece of code in which two new principals are declared, and both Alice and Bob may know the value of , but only Alice may know the value of :
[TABLE]
Here the assignment of to should be allowed because is protected by rights more confidential than . However, in the last line, the value of (which Bob can read) leaks some information about the value of (which Bob should not be able to read). Therefore, this is an unsafe command and it cannot be typed.
Types rules for encryption
The type rule for encrypting values (enc_T) verifies that the encrypted value is less confidential than the set of keys used for encryption, it then types the ciphertext as a public value, i.e., encryption removes the type restrictions on a value while ensuring that the encryption provides at least as much protection. We note that if the encrypting key depends on non-public data, then the program counter would not be public, which would ensured that the ciphertext was not stored in a public variable. Hence the use of restricted keys will not leak information.
The corresponding decryption rule (dec_T) verifies that the principal used to decrypt the cipher is valid () and is consistent with the rights of the decrypted value. As the knowledge of which keys has been used to encrypt is protected with the rights of the cipher, the success of the decryption also depends on . Therefore, the program counter has to be at least as high as when typing the continuation. Finally, as with an assignment, the rule enforces that the created variable does not have a type that is more confidential than the program counter.
Types rules for public channels
Typing rules for public channels ensures that these are only of type public and, when they are used, the program counter is .
Example 4
The following system illustrates the use of public channels and encryption:
[TABLE]
Alice and Bob start off knowing each other’s public keys, and Bob provides a service to add one to any number sent by Alice. The variable is restricted so that only Alice and Bob can know the value. Encrypting this value removes these type protections, so that it can be sent across the public channel . On Bob’s device decrypting with Bob’s private key replaces these type protections.
Types rules for secure channels
For secure channels (client and server side), the rule (connect_2_T) enforces that the principal who is creating the channel and the principal being connected to, both have the right to access the data passed over the channel, hence . In order to ensure that the possible side effects caused by using the channel are not more restrictive than the data passed over the channel we need that . The condition stops side channel leakage to the device receiving the connection at the time the channel is opened. Finally, the program counter is set to once connected: this ensures both devices have the same program counter. Without this we would have implicit leakage, for instance, one device with a public program counter could wait for a message from a device with a non public program counter, then outputs something on a insecure channel. As the sending of the message may depend on a value protected by the program counter, this would result in a leakage.
We make the strong assumption that the existence of a connection attempt can only be detected by someone with the correct private key to match the public key used to set up the connection. If we assumed a stronger attacker, who could observe all connection attempts, we will need the condition that at least for the client.
The output rule (output_T) has two main restrictions: one which verifies that the device still has the program counter agreed with the corresponding device, and i.e., the type on the channel is no less restrictive than the type of data sent. This is because when this data is received it will be treated as data of type .
For channel creation the restriction on the channel must be at least as restrictive as the program counter. For input and output we must additionally check that the program counter has not become more restrictive, hence requiring that the channel restriction and the program counter are equal, i.e., testing the value of a high level piece of data and then sending data over a lower level channel is forbidden.
Example 5
The different roles of and is illustrated in these two programs.
[TABLE]
[TABLE]
Both programs aim at sending to Bob, which is a secret shared by Alice and Bob. In the first case, the sending of depends on its value: therefore the communication should can only be on a channel with rights . In the other example, even if the value of is updated due to a parallel thread that has a non public program counter, the sending of is unconditional.
Note that the language does not have “{if condition then }; structure as this construct would not be safe: if waits infinitely for a connection then is not executed. However, a delay command could be added to help the second program to output after has been updated.
Type rules for release and register
The release command is similar to the encryption command except that the rights with which the principal is encrypted are provided by the principal value. Therefore, there is no static check to perform in (release_T). The registration rule (register_T), for the same reason has less checks than (dec_T). However, it does enforce that , without which we could get non public rights; revealing a such a none public right would then be an information leak. Removing this restriction, and allowing non public rights, would be possible in a more complex type system but we decide not to do so to keep the type system more understandable.
4 Example: A Secure Cloud Server
As an extended example we consider a cloud server that provides a data storage service. The motivation of our work is to make it possible to type an open cloud service, without the need for a global PKI neither the need to verify that its users run typed programs, so ensuring that it provides security guarantees to all of its users. The server process in Figure 10 defines an open service which users can connect to and register to store data. This data can be shared with another principal, hence the server takes a pair of public keys, representing the principals, when registering.
To keep the example simple this server accepts up to 3 accounts and denies further registrations. The data for each accounts is stored in the data variable defined in the process; the restriction set used to type this variable specifies that only the server and the two clients named at registration can have knowledge of this data. Additionally, the server keeps track of how often each account is used (in the array) and runs a process to monitor the usage (the process). If any account is found to have made more than 50% of the total number of requests (plus ), it is temporarily blocked (by setting the corresponding index in the array to 1). The usage data and blocked status are public data. This is an example of an open cloud service which writes to public variables after processing private data. Our type system ensures that there is no leakage between the two.
An example configuration is given in Figure 12, with the definitions of the processes provided in Figure 11: this configuration consists of four devices , and two identical devices. We assume that, in the physical world, and are the laptop and respectively the mobile of Alice while the two other devices are owned by Bob and Charlie. In the system definition, Alice’s and Bob’s devices start off knowing the servers public key, but the server has no knowledge of Alice’s and Bob’s principals. The mobile device first creates a new principal identity and shares the public key to . Note that could also send its private key to at this point which is not the expected behavior. To avoid honest users to establish unwanted connection, a port number mechanism should be added to the connections rules. Once receives the principal’s public key from , creates a new principal identity to use with the cloud service which is known by the mobile’s principal identity. This allows to release and to send the new principal to which registers it. Therefore both and can use the service with the same account. Finally Bob’s device and exchange their public keys, and sends to the public key received from then registers for a shared account between and on the server. Finally, or can upload a value to the server. Meanwhile is able to recover the last uploaded value ([math] if it downloads before an upload occurs).
The security type on the variable means that its value can only have an effect on other variables with the same or a more restrictive type. Importantly, our correctness result limits knowledge of these values to just the Alice, Bob and Server devices, no matter what well-typed code are run in these devices. On the other hand, checking the authenticity of the Bob key (with a mechanism such as PGP, or out-of-band checks) is Alice’s responsibility.
These are exactly the guarantees that a user, or a organisation, would want before using the cloud service. While many people trust their cloud services, and organisations enter into complex legal agreements, leaks can still occur due to programming errors. Type checking the code, as we outline in this paper can show that it is safe from such programming errors, and help provide the users with the guarantees they need to use the system.
5 Security analysis
We now prove that the type system preserves confidentiality of data: when a variable is declared with rights then the only devices that can observe anything when the variable’s value changes are the devices that are allowed to know one of the keys in .
The proof uses techniques from the applied pi-calculus, rephrased for our formalism. Our basic result uses a notion of bisimulation formulated for reasoning about information flow in nondeterministic programs [17]. Intuitively, two programs are bisimilar (for “low” observers) if each one can “imitate” the low actions of the other, and at each step the memories are equivalent at low locations. Note that memory can change arbitrarily at each step, reflecting the ability of concurrent attacker threads to modify memory at any time.
The applied pi-calculus extends the well-known pi-calculus, a process calculus where communication channels may be sent between processes, with binding of variables to non-channel values. In our approach, “memory” is this set of bindings of variables to values in the applied pi-calculus. Also, our bisimilarity is a labelled bisimilarity since we consider communications on channels as observable events. Our correctness result shows that a high (insider) attacker cannot leak information to low observers by communication on high channels or by modifying high locations in memory.
We explain our proof over the following five subsections. In the following subsection we annotate devices with an identifier, so that we can keep track of particular devices as a process reduces, and we define when a particular device is entitled to read data protected with a particular set of rights. In Subsection 5.2 we define our untyped attacker and outline an labelled, open semantics which defines how an “honest” (typed) process can interact with an untyped attacker process. We also prove that this open semantics is correct with respect to the semantics presented above.
To give us the machinery we need to prove our main results, in Subsection 5.3 we annotate our processes with the rights that apply to all variables. We show that a well annotated, well typed process reduces to a well annotated, well typed process, this results shows that a well typed system does not leak data, but it does not account for untyped attackers. To do this we introduce a labelled bisimulation in Subsection 5.4. This bisimulation relation defines the attackers view of a process, and their ability to distinguish systems. Finally, in Subsection 5.5, we prove that, for our open semantics, a well annotated, well typed process is bisimular to another process that is the same as the first, except that the value of a high level variable is changed. This means that no information can leak about the value of that variable for any possible attacker behaviour, so proving our main correctness results.
5.1 From rights to allowed devices
As a preliminary step, we need to formally define which devices are and are not granted permissions by a particular set of rights. To do this we need a way to refer to particular devices while they make reductions, so as a notational convention, we place identifiers on devices, that are preserved through reductions. By convention an identifier will be an index on each device, so for example expresses that and represent the same physical device in different states.
In Definition 1, Definition 2 and Definition 3 below, we formally define an association between the public keys in a rights set and devices, but first we motivate these definitions with an example:
Example 6
Consider the the system where and are defined in Figure 11 and Figure 12 (i.e. there are two clones of each devices of the system from Section 4, except for the server). Consider the variable of the command on Device (the server). There are three reasons for a device to be allowed to access shared data, depending on which reduction occurs in the system.
First, the devices that created the keys and are allowed access to this data. This pair of keys is passed to the server at the start of its loop. Depending on which device ( or ) made the connection to the server channel , allows either Device or Device (as ). Assume that it is Device . Similarly represents Device or depending on which device connected to channel during the command of Device . Assume it is Device .
Next, since the public key has been created by the command in Device , the device which corresponds to the public key is also allowed to access . We assume that it is Device .
Thirdly, the public key has not been generated by any device. However it was in the initial memory of Device , therefore this device is also allowed to access the shared data by the right granted by this key.
Our security property grants that no other device than Device and can get information about the value of . On the other hand, if an untyped attacker provides its own key to Device , through channel , then no security guarantee can be provided about . Indeed, such a case means that the rights explicitly allows the attacker’s device to access the data as any other regular device.
Before we formalise what are the allowed devices, we make reasonable assumption about the initial process. For instance, when the process starts, we assume that devices have not already established any channel between them. We also consider that they have an empty memory except for some public keys and principals (and we do not allow duplicate principals). Finally, we consider that all devices are well-typed except one (Device [math]) which is the untyped attacker.
We first define a well-formed and well-typed condition on processes:
Definition 1
A valid initial process is a process where:
*There is no active channel already established between the processes: . * 2. 2.
The bound values in memory are either principals or public keys: For all , implies there exists , or . 3. 3.
Each principal exists only on one device: For all , , implies . 4. 4.
*The memory of every device is well-typed with contexts corresponding to its memory and : for all , w.l.o.g. assume that , we have for some command . *
Before defining the set of allowed devices, we define an auxiliary function that maintains which devices are allowed to access shared data, and which public keys need to be associated to devices. This auxiliary metafunction maps backward from a set of rights that confers access, to all possible devices that may have provided the keys that gave them those rights. This is the set of allowed devices where is a process trace, defined below. Any device that is not in this set is not allowed by ; we will consider such devices as attacker devices in our threat model.
Definition 2
Given a reduction where devices identifiers are in , given a subset of identifiers and a set of public keys , we define the backward function in the following way.
- •
If is the reduction (newPrin_S) on a device , that creates a new principal and that then
[TABLE]
- •
Otherwise .
Definition 3
Let a valid initial process. Let a sequence of reductions and let a set of public keys, we consider
[TABLE]
Let . We define the set of allowed devices identifiers as .
*Consider the set . We say that is safe if this set is empty. *
In other words, is safe when all keys involved by have been either created by devices of or owned by them at the beginning. This implies, since valid initial processes don’t have duplicated keys, that the untyped attacker whose index cannot be in have not generated any of these keys.
5.2 Definition of the attacker and of the open process semantics
An attacker is a device where is a standard memory and is a command which is not typed and which contains additional expressions to do realistic operations that an attacker can perform like extracting , and from , decrypting a ciphertext with only a secret key, or releasing a principal with arbitrary rights ( with ).However, the attacker is not able to create principals with a public key that does not correspond to the private key because we assume that the validity of any pairs is checked when received by an honest device. We denote such an extended expression using .
To reason about any attacker, we introduce open processes in a similar way as in the applied pi-calculus [1]. An open process has the syntax where are well-typed devices (the indexes are the tags of the devices: we do not change them through reductions), where are the channels which have been established between devices (not with the attacker) and where is a memory representing the values that the attacker already received. We refer to as the honest devices. We also refer to as the attacker knowledge. This plays the same role as frames in the applied pi-calculus, and also plays a similar role as computer memory in the bisimulation that we use for reasoning about noninterference for nondeterministic programs. We denote the evaluation of with the memory (to be defined the variables of should exists in ).
Our type system ensures that an attacker is never able to learn any of the secret keys belonging to “honest” devices, as represented by the notion of reference rights defined below. The following predicate overapproximates what an attacker can learn about a value, based on the “knowledge” represented by its memory and on the assumption that it knows all keys which are not in (which aims at being the reference rights).
Definition 4
Given a set of keys and a value , we define the predicate as there exists an extended expression such that , where this extended expression contains all standard functions and attacker functions, as well as an oracle function which provides the secret key of any public key which is not in .
Open processes have two forms of reductions: internal reductions, which are the same as the reductions for closed processes, and labeled reductions which are reductions involving the attacker. Labels on these latter reductions represent the knowledge that an attacker can use to distinguish between two traces, effectively the “low” information that is leaked from the system.
There are two forms of labelled reductions, both of which take the form . In the first form, input reductions , the attacker provides data, and in the second form, output reductions , the attacker receives data from an honest device. There are also two further forms of input reductions: those for establishing channels and those which send data. The reduction that establishes a secure channel takes the form:
[TABLE]
where is any attacker channel name, should be the private key corresponding to and should be the public key of . The reduction to establish a public channel is similar but simpler. There is no need for checks on or .
Note that, unlike standard connection establishment where a fresh channel name is added to , here the name of the established channel is provided by the attacker and is not added in , which is out of the scope of the attacker. However the attacker has to provide channel names in a separate subdomain which prevents it from using an existing honest channel name. The names which are the attacker’s channel names are written . To summarize, a channel name of form represents a channel which is established between a device and the attacker, a channel is a channel between two devices (not accessible from the attacker) and a channel name and not in the attacker’s channel domain is just a program variable representing a future channel. Finally, we consider an implicit injection from to attacker’s channels.
For input reductions that sends data on an established attacker channel, is an expression of the extended syntax admitting lower level operations that are available to attackers but not to honest devices, as explained above. There is just one rule for output reduction, saying that an attacker can learn from a value output on an attacker channel:
[TABLE]
Example 7
We consider the system consisting of from Figure 12 running in parallel with an untyped attacker . The corresponding open process is initially . Assume that Device () reduces with the attacker instead of , we have:
[TABLE]
where and Device has memory (a renaming of the local variable on the device), after the first internal reduction where the principal is created.
*After the first reduction where creates its principal, the attacker establish the connection with Device : as is expecting a connection of type , the attacker have to provide and one of its channel name . Next, Device outputs the value of on which is then stored on the attacker’s memory. *
The following defines a subprocess of the “honest” devices of a system , where some (other) devices of that system may be attacker devices. This subprocess of honest devices will be those defined by . In other words, all devices which are not allowed by some key in are assumed to be controlled by the attacker.
Definition 5
*Given a process and , let the names of channels between devices of we define the subprocess of devices where is where each channel name have been replaced by an attacker-channel name . *
In the following, we will denote by a sequence of reductions . We also denote by a reduction which is either or for some label and by either or .
The following proposition states that if there is an execution of a system that includes communication with attacker devices, where all communication is local in this closed system, then we can consider subsystem of this, omitting the attacker devices, where communications with attacker devices are modelled by labelled reductions of the form described above. So the attacker devices become part of the context that the devices in the subsystem interact with through a labelled transition system. Such a subsystem may also exclude some of the “honest” devices in the original system, and in that case we treat those excluded devices as attacker devices in the context (since labels on the reductions only model communication with attackers).
Proposition 1
Let where all are well-typed and is an untyped device. If , then for all subset of , there exist and (the attacker knowledge at the end of the execution) such that
[TABLE]
*where is a memory which is the union of the memories of and all with (variable names are renamed whenever there is a name conflict). *
For instance, the system , where is part of the attacker, can perform three internal reductions between and where sends its public key to . In 7, we provided the three reductions of the system .
5.3 Extended syntax with extra annotations
In this section, we add extra annotations to processes to perform a specific analysis about a given right . Since the keys in do not necessary exist in the initial process, we first annotate open processes with a reference right with the intention that this right will eventually grow to the right as keys are generated and added to this right during execution. Given a reference right , we define any rights whose all keys are in to be high (by opposition to low). The set starts with keys that exists in the initial process.
An annotated process has the syntax
[TABLE]
for attacker knowledge , reference right and process . The exact form of the annotations is explained below. The reference right only changes during a reduction of a command . In this case, there is a choice of whether or not to include the generated public key in .
Definition 6
A sequence of reductions is called standard if each time a (newPrin_S) reduction adds a key to :
[TABLE]
we have .
The next proposition ensures the existence of a standard annotation such that the rights we want to consider at the end are high and that the attacker does know any key in the set of the initial process. We will state in 3 that this implies that the attacker never knows the keys in during the whole reduction.
Proposition 2
Let be a valid initial process such that and let a set of keys defined in . If is safe then there exists a standard annotation for the reduction provided by 1: where and are such that and .
When a variable or a channel is created with some rights, the reduction rules of the operational semantics remove all information about those rights. Therefore we add annotations to devices to remember if the defined right was high or low, according to . Recall that a right is high if it contains a key in the reference right . We use as a metavariable for an annotation or .
A memory location which is created by a command , where evaluates in to a high right according to , is annotated as a high location: . Otherwise is annotated as a low location: . By convention contains only low locations. 2. 2.
A new channel name , which is created by a command that establishes a channel , is annotated as where resp. is if the evaluation of (resp. ) is , and is annotated as otherwise. 3. 3.
A value which is the result of an expression (besides an encryption expression) where one variable refers to a high location is annotated as a high value . Otherwise, it is annotated as a low value . When a value is encrypted, it becomes i.e., the initial tag is associated to the subterm. 4. 4.
A device is annotated as . There is one tag for each sub-command which can be reduced. The annotation of each thread records whether the existence of this thread was due to a high value or a high right. When , we say that the thread is high, otherwise the thread is low. For instance, the annotated device reduces to because is a high location. The other case where a thread can be set to high is in the establishment of a secure channel that is annotated with .
These annotations allow us to define technical invariants that are preserved during reduction. In the following technical definition, we formalize the idea that there exist a typing judgment for devices (Case (1) below) which is consistent with the annotations:
Variables (Case (2)) and channels (Case (3)) should have a type which corresponds to their annotation 2. 2.
The type system tracks a notion of security level for the control flow, and this level must be consistent with the thread annotation (Case (4)). 3. 3.
In addition, we express that the devices are not in a corrupted configuration, in the sense that secure channels are not being used to communicate with the attacker (Case (5)), nor are they used in a low thread (Case (6)). 4. 4.
Finally, ciphers for values (Case (7)) and principals (Case (8)) should not have high contents protected by low keys.
Definition 7
A tuple consisting of a reference right , a memory and a thread is well-annotated written “” if there exists , , , and such that
** 2. 2.
For all locations in , we have
- •
either in for some value and
- •
or and
- –
if , in ,
- –
if , in for some value . 3. 3.
*For all channels in the thread such that , the annotation of is where iff. , iff. . * 4. 4.
* if and only if .* 5. 5.
*For all , we have with and . * 6. 6.
if is , then there is no in . 7. 7.
For all values stored in memory, if a sub-term of matches then . 8. 8.
For all values in memory, if a sub-term of matches then or .
When, for a device and a set , we have for , then we use the notation .
Finally, we get the following subject-reduction result:
Proposition 3
Let an open process such that for all we have and for all we have . If then for all we have . Moreover if the reduction is standard, we have for all .
Finally, valid initial processes which only require each device to be well-typed are well-annotated.
Proposition 4
Given a valid initial process P=\Big{(}\mathcal{K}\models\langle M_{1}\blacktriangleright C_{1}\rangle_{1}|\ldots|\langle M_{n}\blacktriangleright C_{n}\rangle_{n}\Big{)}, We have for for any .
5.4 Labelled bisimilarity
The invariants expressed above are about a single process. To ensure that the attacker cannot track implicit flows, we need to compare the execution of two processes in parallel. In this section, we define a relation between processes which implies an adapted version of the bisimilarity property of the applied pi-calculus [1].
The two processes that we compare are the actual process and another one where the value of one of the high variables of the memory of some device has been changed to another one. So first, we define what is a process where a variable is modified arbitrarily.
Definition 8
Given a device , and a value , we define to be the device that updates the variable to be by assignment, . Extending this from devices to processes, given a process which contains with index , we define to be the same process except that has been replaced by .
In contrast to systems where the attacker can only observe public values in memory after reduction, here we model that the attacker can observe communications on channels that have been established with other devices. In the following examples, we stress how an attacker can distinguish between two processes even without knowing actual confidential values in the memory of those devices.
Example 8
We consider the device , where the description is parametrized by two meta-variables: is a value stored in memory and is a value that is encrypted and sent on an attacker channel. The device has a memory , and the full description of the device is
[TABLE]
The process can be distinguished from the process . In the first case, we have:
[TABLE]
On the other hand, there is no reduction with only the label and internal reductions starting from . This distinction models the fact that if the attacker receives data on , it learns that .
The processes and can also be distinguished even if both and can reduce with a label . Indeed, after reduction the attacker’s knowledge is where is (resp. ): By performing the decryption of with an untyped decryption, the attacker can compare the result to , and is only true in the first case.
Finally, the processes and are not distinguishable. In both cases:
- •
there is no test to distinguish between the two attacker’s knowledge, and
- •
the labelled reductions are the same i.e
With these examples in mind, we introduce static equivalence, an adaptation of the one used in the applied pi-calculus, which expresses that it is not possible to test some equality which would work with one memory but not with the other.
Definition 9
*We say that two memories and are statically equivalent () if they have exactly the same variable names and for all extended expressions where its variables , we have . *
Finally, we define an adaptation of the labelled bisimilarity. This recursive definition generalizes the conclusion of the example: the two memories should be statically equivalent, and a transition in one process can be mimicked in the other process, where the reduced processes should also be bisimilar.
Definition 10
Labeled bisimilarity () is the largest symmetric relation on open processes such that implies:
; 2. 2.
if , then and for some ; 3. 3.
*if , then and for some . *
The labelled bisimilarity is a strong equivalence property, and its we use it to state the security property that an attacker is unable to distinguish between two processes in the same class [2]. However this definition does not help to actually compute the processes in the same class. Therefore we define a stronger relation that can be defined from our annotated semantics.
First, we define an obfuscation function which takes a set of public key and a value or a principal and returns an obfuscated value (its syntax is like the syntax of value except that there is an additional option and that a principal value is also an option).
Definition 11
*Let a set of public key, a principal or a value . We define depending on the structure of .
[TABLE]
where is a value or a principal and a nonce.
The set aims at containing a set of keys whose private keys will never be known by the attacker. In our previous example, we have : if is never known by the attacker, the attacker will never be able to know anything about . Note that the random seed used for the cipher is not hidden. Indeed, the attacker is able to distinguish between a cipher that is sent twice and two values which are encrypted then sent: in the first case the two message are strictly identical.
We now define an equivalence on memories: given a reference right , a set of safe keys, then we say that two memories are equivalent if they differ only by the high values which aims at never been sent to the attacker and by the obfuscated terms of the low values.
Definition 12** (equivalent memories)**
Equivalent memory* is a relation on memories such that if for each low location in (resp. ), there exists in (resp. ) such that and each location in (resp. ) exists in (resp. ). *
Next, we define an equivalence relation between two well-annotated processes: two processes are equivalent if they have the same commands up to some additional high threads and have equivalent memories:
Definition 13
Two annotated processes and are annotated-equivalent () when there exists an alpha-renaming (capture-avoiding renaming of bound variables) of such that
[TABLE]
[TABLE]
and we have , for all , we have and furthermore for all , first , next, the annotation on each thread (where stands for or ) is either or such that either:
- •
**
- •
* and (the commands are syntactically identical up to renaming of bound variables).*
Finally, we prove that this relation actually implies bisimilarity.
Proposition 5
Let be and be , where both are well-annotated processes such that , Furthermore assume that for all we have . Then the processes are annotated equivalent, (with the annotations removed).
5.5 Main theorems
Our first security property grants confidentiality for each created variable in the following way. When a variable is created with a protection , this implicitly defines a set of devices which are allowed to access . If the untyped attacker is not in this set, then any collaboration of the attacker with the denied-access devices can not learn any information about the value stored by the variable: they cannot detect an arbitrary modification of the variable.
Theorem 1
Let be a valid initial process. We consider a reduction with such that for some , is . Let be the set of keys corresponding to , let and let be the unions of memories of the devices of whose indexes are not in and of device . If is safe (as stated in 3), then .
Proof. According to 1, we have where . As is a valid initial process, we get that is also a valid initial process (the verification of all conditions to be a valid initial process is immediate). Since is safe, we consider , and the standard annotated semantics from 2:
[TABLE]
where and
[TABLE]
From 4, the annotation of the initial process is well-annotated. From 3 on this annotation, we get for all : and, due to (1), for all we have . Since , and given the (assign_T) rule, we also have
[TABLE]
Therefore, we satisfy all conditions of 13:
[TABLE]
Finally, we conclude with 5.
Example 9
*From the example of Section 4, let consider the variable of the device . The Theorem 1 states that only devices from can distinguish between or . Whatever are the reductions, these devices contains for sure since its key is known from the beginning and since it creates . The only threat is that the channel has not been established with or that has not been established with the device of Bob. These threats could be removed by an additional authentication protocol or the use of a private channel (for instance if the connection between and is a direct wired connection). This means that not knowing which code is run on and is not a threat as long as and guarantee that their codes are well-typed. *
Finally, we state a standard security property in the simpler case where there is no untyped attacker. Given a process , if, at some point, we change the value that a variable is bound to in memory, where that variable has been typed with right , then new reductions will not alter values in memory locations that have been typed strictly less confidential than (i.e., those variables have rights that contain public keys not contained in ).
Theorem 2
Let be a valid initial process. We consider a reduction with such that for some , is . Let be any value of type . Let such that then there exists such that where for all memory location that have been created in any device with rights with , we have where is the value of in and the value of in .
Implementation: We have implemented an interpreter for this language in Ocaml555https://github.com/gazeaui/secure-type-for-cloud. The program strictly follows the extended semantics, it has commands to add a new device; to do an internal reduction on the selected thread(s) of the selected device(s); to perform an attacker communication, and to type-check each device of the system according to the annotations. To define a device which starts with keys in memory, two additional commands are provided which are allowed only as a preamble: and where is a number; an identical on two devices represents a shared key. This implementation demonstrates that the syntax is well-defined and effective, and allows us to test the invariance of the properties with demonstrative examples. The example of Section 4 has been tested using this implementation: we are able to reduce the process such that Bob gets the secret from Alice and we can verify that each step correctly type-checks.
6 Conclusion
We have presented a security type system that provides location based correctness results as well as a more traditional non-interference result. The key novelty of our system is to allow principal identities to be created and shared across devices in a controlled way, without the need for a global PKI or middleware layer. Hence, our correctness result states that well-typed devices can control which other devices may acquire their data, even in the presents of untyped attackers. We have illustrated our system with an example of an open cloud server that accepts new users. This server does perform some monitoring of its users but our type system proves that it does not monitor the content of their data. We argued that our framework is particularly appropriate to cloud systems where organizations will want guarantees about where their data will be stored as well as the secrecy of their data.
Acknowledgement
We would like to thank Alley Stoughton for her help with this work; her insightful comments and useful advice greatly improved this paper.
Appendix A Additional Language Rules
Reduction is defined modulo the standard equivalence rules, which may be applied in any context, e.g.
The rules not mentioned in Figure 6 can be found in Figure 15 and the types rules not mentioned in Figure 8 and Figure 9 can be found in Figure 13 and Figure 13.
Appendix B Proofs of correctness
Here we provide the proof of all propositions of Section 5.
B.1 From rights to allowed devices
No proposition in this section.
B.2 Definition of the attacker and of the open process semantics
See 1
Proof. By induction on the number of reduction steps. We assume that the result is true for reductions i.e we have which is mimicked by
[TABLE]
and we prove that if then we can also have
[TABLE]
If the last reduction involves devices which are part of , then this same reduction can be done on . 2. 2.
If the last reduction involves devices which are not part of then no reduction has to be done in the open semantics. 3. 3.
If the last reduction is (i/o_S) from outside to transferring on then the labeled reduction is possible since has been produced from values into memory not in and the attacker can derive all these values. 4. 4.
If the last reduction is a channel establishment between a device from and a one not from , the proof is similar. Note that the new channel name is annotated with instead of . 5. 5.
If the last reduction is (i/o_S) from to outside transferring on then the labeled reduction exists and we still have for all in memory of a device of which identifier is not in , .
B.3 Extended syntax with extra annotations
See 2
Proof. We assume that is the -th reduction of . We denote by , after reductions. Let for and .
We now prove by induction on the number of reduction steps that the annotation such that is standard.
Note that as the set is safe, we have . Moreover we also by definition.
where contains all the keys of which have already been created.
The only reduction we need to consider are the one which might not be standard i.e. when a reduction which creates a new principal where is in . However, by design of the function, we have : the reduction is standard.
In the following auxiliary lemma, we state that low expressions reduces to low values while high expressions reduces to high values.
Lemma 1
Let and be such that the sub-expressions of are such that with iff. . Let such that . We have: with iff .
Proof. If , we consider all constructors:
. If with , due to the typing rule or has type with , by the induction hypothesis, one of them reduced to a high value and the reduction rule states that the result has to be high. If with due to the typing rule both and has type , by hypothesis both of them reduce to a low value, the reduction rule states that the result will not be low. 2. 2.
. If with , due to the typing rule element_T or has type with label , by the induction hypothesis, one of them reduced to a high value and the reduction rule states that the result has to be high. If with , due to the typing rule both and has type , by hypothesis both of them reduce to a low value, the reduction rule states that the result will be low. 3. 3.
. According to enc_T, we have and reduces to a low value (since the new root is low). 4. 4.
, according to release_T, and reduces to a low value. 5. 5.
, according to pubKey_T, and reduces a low value.
Lemma 2
Let such that for some and where is a command with an expression such that . We have that iff. for some value .
Proof. By induction on the structure of . If is a variable, we conclude from Item 2 of 7. For the inductive case, we use 1 to conclude.
See 3
Proof. W.l.o.g. we consider that the reduction is done on (eventually and if it is a communication between two device). We write and . Let , , and such that . We prove all the conditions of well-annotated separately.
- 1.
** is well-typed for some **
By hypothesis, we know that is well-typed for some , , . If then the two new threads after reduction are well-typed with , , and due to (para_T). If then the two new threads are also well-typed due to (bang_T). Otherwise, the form of is (up to name renaming) or where or is an renaming of . All the corresponding type rules enforces that is well-typed only if is well-typed. 2. 2.
For all location in , we have
- •
either in for some value and
- •
or and
- –
if , in ,
- –
if , in for some value .
First, we consider the annotation of variables. Once a memory location is created, there is no reduction rule to change its type ( or ), never becomes more confidential ( increases only with new fresh keys) and all types rules enforce that the reduced command can be typed with a context which preserves rights for all variables. For these reasons, if the property holds when a variable is created, it holds forever. During creation, by design of the annotated rules, the location is created as high or low only depending on if for regular variable and is always low when the location contains a principal or a key. For inputs, since the input command does not specify the type, we look at the annotation of the channel, which is itself defined from the condition in the annotated semantic rules about channel creation (high open_priv), (low thread/high value open_priv), (low open_priv) for channel opening with the attacker, from the fact that for all we know that the channel has a type for values less confidential than .
For the annotation of values, we need to check that no high values are stored into low memory (the annotation always produces high values from high memory). The reductions which set new values to low memory locations are the following: (low assign), (low new), (deref_S), (low dec_true), (low i/o) and (att. input). For the first rules ((low assign), (low new) ), the stored value is the evaluation of an expression . Since the type rules associated to these commands enforces to be of a type such that , according to Lemma 2, reduces to a low term. The argument is the same for (deref_S).
For the internal communication (low i/o), the argument is the same except that the type judgment of is on the other device. For the input from the attacker att. input, the rule set that attacker’s inputs are low.
For rule (low dec_true) where one layer of encryption is removed, we have to ensure that the decrypted value is low. The typing rule requires that and which implies . From Item 7, we conclude.
The rule (high thread/low var dec_true) can never be applied. Indeed, if the cipher is high, from 2, the cipher has a high type, then the typing rule enforce the program counter to be high which means the thread has to be high. 3. 4.
** if and only if .**
First note that all types rules of commands requires that their continuation can be typed with the same program counter. The interesting cases are when the type rules requires the continuation to be typed with a more restrictive program counter and, for the other direction of the equivalence, when the reduction rules reduces a low thread into a high thread.
- •
Conditional is typed with (if_T): the program counter of the continuation if set to when at least one of the expressions is more confidential than (we assume that initially ). According to 2, one of the expression reduces to a high value. In this case, the reductions rules (high if_true) and (high if_false) sets the thread to high. Reciprocally, if the thread is set to high, on of the value is high and according to 2 the type of the continuation is .
- •
Secure channel opening is typed with (connect_2_T) and (accept_2_T). Both type rules requires the continuation to have a program counter set such that iff where is the right to see the channel. The reduction rule (high open_priv) can be applied only when and , the rights to see the channel of both threads are equal and sets both threads to high exactly with the same condition.
- •
Decryption is typed with (dec_T), the program counter of the continuation if set to when the expression to decrypt has rights with . This coincides with the reductions rules (high dec_true), (high thread/low var dec_true),(high dec_false) where the thread becomes high iff the decrypted value is high (due to 2). 4. 3.
For all in the thread such that , the annotation of is where iff. , iff. .
The annotation on channel is definitively fixed once it is created. The channel establishment rules (high open_priv), (low thread/high value open_priv), (low thread/high value open_priv) and (low open_priv) all enforce this property in their premisses. Public channels have type and are annotated with . 5. 5
For all , we have with and .
The interesting case is the creation of a new channel with the attacker. This can be done through rules (att. open_server) and (att. open_priv_client). By hypothesis, the attacker does not know any key in so it can only provides a secret key not in . Due to (connect_2_T) or (accept_2_T), and should contains . Therefore neither or is a subset of , which means and as is required to be empty. 6. 6
If is , then there is no in .
This is true before reduction. Since establishing a new channel set the thread to high, the property is also true after reduction. 7. 7
For all in memory, if a sub-term of is then .
The encryption typing rule for , (dec_T), requires that is such that . From 2, we have implies and therefore which allows to conclude. 8. 8
For all in memory, if a sub-term of matches then or .
Encapsulated principals can only be produced with a expression or by an input of the attacker. In the first key, the release reduction (release_E) always encrypts the principal with the keys of in the other case since by hypothesis we have we also have ) for any so the attacker cannot output this value.
Finally, we prove that when the annotation is standard, we have for all . The result is immediate for reductions which do not change or .
The set increases only with the reduction (att. output). The values which contains secret keys of are of type (or a combination of tuples and encryptions of this type). According to item 8, any is encrypted with . Moreover, by definition of standard annotation, all keys in are such that all keys of are also in . Since , the attacker cannot decrypt the key.
The reduction which modify is (high newPrin), but the new key is fresh so the attacker cannot derive new facts from the fact that is syntactically more powerful than .
See 4
Proof. Condition item 6 of well-annotated is true since and the tread is . Since and only contains principals and public keys, the other conditions which concerns other kind of values and variables are immediately true.
B.4 Labelled bisimilarity
Lemma 3
When two memories and only contains low locations, and for all such that and implies that, for all extended attacker expression (Figure 16), .
Proof. By induction on the structure of . This is true if is a variable or an integer. We assume that for all sub-expressions of . For , has to evaluate has an integer. Since the function returns the -th principal of a determined sequence of principals, the expression returns the same key in both memory. For , since the extended expression has to evaluate to a right it evaluate to the same right for both memories, so the result is either obfuscated with both memories or none of them. For , the result is immediate if evaluates to a cipher which is not obfuscated. Otherwise, is a cipher encrypted with rights , , by hypothesis, we have that cannot corresponds to any key in therefore the decryption fails (result NaV) in both memories.
Lemma 4
When two memories and only contain low locations, and for all such that and implies that .
Proof. Let and be two extended expressions which evaluates to the same value under (). We show that . From 3, we have . However the obfuscation function does not obfuscate the nonce of ciphers. And the attacker power does not allow the attacker to generate a cipher with a nonce which is not fresh (this is due to the fact that nonce cannot be extracted from ciphers from which the key is unknown). Therefore each nonce is uniquely associated to a unique value. As nonces are equals the obfuscated values are equals too.
Lemma 5
Given a command context , a set of public keys , two memories and , an expression and , if we have and , , and then .
Proof. By induction on the structure of .
If is a variable with then due to Item 2 of Definition 7, we have . By definition of , we conclude . 2. 2.
If is with , due to rule (low sum) and (high sum) is low implies that none of its sub-expressions is high, we conclude by induction. 3. 3.
For , the principle is the same. 4. 4.
For , the principle is also the same. 5. 5.
If with . If , due to the type rule (enc_T), with . According to 2 is low, we conclude by induction. If , then so does not appear. 6. 6.
For expressions which involves principals ( and ) we conclude as principals are not stored in high location.
Proposition 6
Let and be two well-annotated processes such that () and , we have:
if then and for some . 2. 2.
if , then and for some ,
Proof.
To avoid conflict of indexes, we consider and instead of and .
When the reduction is a labeled reduction with the attacker
- •
Case where the reduction rule applied to is (att. open_public) on a thread . From (connect_1_T), we know that . From Item 4 of 7, we know that this reduction is possible only when . Hence there exists the same thread in . The reduction on the corresponding thread of leads to through the same label up to the name of , but they are equal up to alpha-renaming of the channel name. Hence we have the condition on commands of 13. Finally the memory is unchanged and the attacker does not learn any value any both process: the condition about memories of 13 and memory equivalence of the attackers hold.
- •
Case where the reduction rule applied to is (att. open_server), on a thread , with label be the values in the label of the reduction. Since (according to our hypothesis), due to (att. open_server), and due to the type rule (accept_2_T), we have and . According to Item 4 of 7, the thread is low (). So an identical thread exists in , and since the label enforces , and to be the sames, it reduces to up to the renaming of the channel name. Finally, , , and remain the sames.
- •
The case where the reduction rule applied to is (att. open_priv_client) is similar.
- •
Case where the reduction rule applied to is (att. output) on a thread , with label . According to Item 5 of 13 the channel is of type with and . According to (output_T), we have and so . According to Item 4 of 7 the thread is low (). Therefore there exists an identical thread in where the same label reduction is possible. Moreover, according to (output_T), we have with so according to 2, for some . According to Lemma 5, we have , hence .
- •
Case where the reduction rule applied to is (att. input) on a thread with label . According to Item 5 of 13 the channel is of type with and . According to (input_T), we have and so . According to Item 4 of 7 the thread is low (). So there is an identical thread in (which reduces with the same label). According to 3 we have . Therefore, the new memories and are equivalent.
When the reduction is made on a thread where
In that case, we use the fact for any thread on some device with memory on which the reduction happens, there exists a thread on the other process with memory such that and .
We prove all the requirements sequentially: first, we prove that reduces to the same in both process. Secondly, we prove the commands reduce to equivalent commands. Thirdly, we prove the equivalence of the new memories.
We have The one reduction that changes is (high newPrin). When reference rights change in a reduction due to (high newPrin), the thread is low due to the typing rule so the reduction can also occurs in process . We conclude by renaming the new key variables of both processes to the same name. 2. 2.
Reductions rules for is the same as for : (or both new threads are high)
There are two cases for “if”, and two cases for “dec”
- •
The rule is (high if_true) or (high if_false). In these cases, according to Lemma 2, the evaluated expressions was high. So it is also true for therefore the conditional expression of also evaluates to a high value according to Lemma 2. On the other hand, the new thread is so whatever reduces to the then or the else branch, the condition about commands of 13 is satisfied.
- •
The rule is (low if_true) or (low if_false). This means the expression evaluates to a low value. In addition, the evaluation returns an integer which implies the expression can only contains , integers and variables. This implies that the results is only dependent on low values: since low values in and are the same both reduce according to the same rule.
- •
The rule is (low dec_false), (low dec_true) or (low thread/high var dec_true). Since the value to decrypt is low, the value in and differ only on high sub-terms which means that the encryption key is the same, so both processes take the same branch.
- •
The rule is (high dec_false) or (high dec_true): whatever (low dec_false) or (low dec_true) is taken the thread become high on and . 3. 3.
New values in memories are such that . First of all, because the rights are the same in and and premisses on the rules depends only on rights, the high or low rule is applied for both command.
- •
Rules that store values into high memory locations (high new), (high assign), (low thread/high var dec_true), (high dec_true): they affect high locations so condition holds whatever is the stored value.
- •
Rules that store values into low memory locations: (low new), (low assign), (low dec_true), (deref_S) according to Item 2 of Definition 7, these values are not high so, according to Lemma 5, we have .
- •
Rule (low register): is in a low memory location so and so the new principals are identical.
- •
Creates a principal (high newPrin) and (low newPrin): the new value contains fresh values and rights that are identical in and .
When the reduction is done on a thread where
Internal reductions does not change the equivalence class of the process (so the other process can simulate it with no reduction). Indeed, first, there is no reduction that reduces a high thread to a low thread: the reduced command does not impact the equivalence class of the process. Secondly, the reductions that create or change memories location are: (high new), (high dec_true) and (high assign). In each case, the type rules states that where is the right of the memory. Due to Item 4 of Definition 7, we have that is high. Hence is high too, so the rule either creates or change a memory location with flag .
When the reduction is made between two threads and where is high.
- •
If both and are high. The only possible reduction are (high i/o) and (high open_priv). In case of (high open_priv), from Item 6, we have . The types rules (connect_2_T) and (accept_2_T) enforces that and are high. Due to Item 3, the channel is annotated with . In case of (high i/o), the type rules (input_T) enforces that all received values are stored into high memory. Since high threads and high memory locations are free to differ between the two processes, the equivalence is preserved after reduction.
- •
If is low. Any annotation variant of the rule (i/o_S) is not possible when one thread is low and the other is high. Indeed, according to Item 3 of Definition 7, there is no channel on the low thread. On the other high thread, a or channel has a low type according to Item 3 of Definition 7, so according to (output_T) and (input_T) they cannot be used one this thread.
So only establishing connection is possible with rule (high open_priv) where the client is on the low thread and the server is on the high thread . Due to the reduction rule (high open_priv), is a high thread and is high too so the equivalence is preserved in both processes.
When the reduction is made on two threads which are low.
For all communication opening rules, rights in both process are identical since the public keys of the rights set are public. So both processes thread reduces either to a high thread or a low thread. Hence the two reduced processes remains in the same equivalent class.
For the input/output reduction rule (low i/o), according to Item 6 of Definition 7, the channel type can only be or since the threads are low. On the output thread, high values are sent only on channels. According to Item 3, on the input side has type with . According to (input_T), the value is stored in a high memory location so even if the values differ between and memories are still equivalent. On the other hand, if the values and sent on the channels are not high then they have to be equivalence: .
See 5
Proof. The first item is a direct consequence of 4 The two last items are a consequence of 6 with the fact that when the annotation is standard, according to 3, we have that is preserved by the reduction, so the proposition can be generalized to any number of reduction steps.
B.5 Main theorems
See 2
Proof. Since there is no untyped attacker, we can use non-standard annotations. We consider the one such that at the beginning contains all the keys of which are present in some memory, and which add a key in if and only if this key is in . From 6 on this annotated process, we get the existence of such that which imply that for all memory in and in , we have . By definition of , we conclude.
Appendix C Annotated rules
We define to be except if .
[TABLE]
[TABLE]
[TABLE]
[TABLE]
[TABLE]
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Martín Abadi and Cédric Fournet. Mobile values, new names, and secure communication. In Proc. 28th ACM Symposium on Principles of Programming Languages (POPL’01) , London, UK, 2001. ACM.
- 2[2] Myrto Arapinis, Jia Liu, Eike Ritter, and Mark Ryan. Stateful Applied Pi Calculus . Springer Berlin Heidelberg, Berlin, Heidelberg, 2014.
- 3[3] Aslan Askarov, Daniel Hedin, and Andrei Sabelfeld. Cryptographically-masked flows. Theor. Comput. Sci. , 402(2-3), 2008.
- 4[4] Jesper Bengtson, Karthikeyan Bhargavan, Cédric Fournet, Andrew D. Gordon, and Sergio Maffeis. Refinement types for secure implementations. ACM Trans. Program. Lang. Syst. , 33(2), 2011.
- 5[5] Bruno Blanchet. An Efficient Cryptographic Protocol Verifier Based on Prolog Rules. In Proc. of the 14th IEEE Computer Security Foundations Workshop (CSFW’01) . IEEE Computer Society Press, 2001.
- 6[6] Tom Chothia and Dominic Duggan. Type-based distributed access control vs. untyped attackers. In Formal Aspects in Security and Trust, Third International Workshop, FAST , 2005.
- 7[7] Tom Chothia, Dominic Duggan, and Jan Vitek. Type-based distributed access control. In 16th IEEE Computer Security Foundations Workshop (CSFW-16) , 2003.
- 8[8] Dorothy E. Denning and Peter J. Denning. Certification of programs for secure information flow. Commun. ACM , 20(7), July 1977.
