Revisiting Occurrence Typing
Giuseppe Castagna, Victor Lanvin, Micka\"el Laurent, Kim Nguyen

TL;DR
This paper revisits occurrence typing, enhancing it with set-theoretic types to create a unified framework that improves type inference, supports intersection types, and optimizes compilation in gradually typed languages.
Contribution
It introduces a general set-theoretic type framework for occurrence typing, unifying and extending existing approaches and enabling new applications.
Findings
Develops a comprehensive set-theoretic occurrence typing framework
Enables reconstruction of intersection types for unannotated functions
Improves compilation efficiency for gradually typed languages
Abstract
We revisit occurrence typing, a technique to refine the type of variables occurring in type-cases and, thus, capturesome programming patterns used in untyped languages. Although occurrence typing was tied from its inceptionto set-theoretic types-union types, in particular-it never fully exploited the capabilities of these types. Here weshow how, by using set-theoretic types, it is possible to develop a general typing framework that encompasses andgeneralizes several aspects of current occurrence typing proposals and that can be applied to tackle other problemssuch as the reconstruction of intersection types for unannotated or partially annotated functions and the optimizationof the compilation of gradually typed languages.
| Code | Inferred type | |
| 1 | ⬇ let basic_inf = fun (y : Int | Bool) -> if y is Int then incr y else lnot y\end{lstlisting} &\vfill $(\Int\to\Int)\land(\Bool\to\Bool)$ \\\hline 2 & \begin{lstlisting} let any_inf = fun (x : Any) -> if x is Int then incr x else if x is Bool then lnot x else x |
|
| 3 | ⬇ let is_int = fun (x : Any) -> if x is Int then true else false let is_bool = fun (x : Any) -> if x is Bool then true else false let is_char = fun (x : Any) -> if x is Char then true else false |
|
| 4 | ⬇ let not_ = fun (x : Any) -> if x is True then false else true | |
| 5 | ⬇ let or_ = fun (x : Any) -> fun (y: Any) -> if x is True then true else if y is True then true else false |
|
| 6 | ⬇ let and_ = fun (x : Any) -> fun (y : Any) -> if not_ (or_ (not_ x) (not_ y)) is True then true else false |
|
| 7 | ⬇ let f = fun (x : Any) -> fun (y : Any) -> if and_ (is_int x) (is_bool y) is True then 1 else if or_ (is_char x) (is_int y) is True then 2 else 3 |
(two other redundant cases omitted) |
| ⬇ let test_1 = f 3 true let test_2 = f (42,42) 42 let test_3 = f nil nil |
1
2 3 |
|
| 8 | ⬇ atom nil type Document = { nodeType=9 ..} and Element = { nodeType=1, childNodes=NodeList ..} and Text = { nodeType=3, isElementContentWhiteSpace=Bool ..} and Node = Document | Element | Text and NodeList = Nil | (Node, NodeList) let is_empty_node = fun (x : Node) -> if x.nodeType is 9 then false else if x is { nodeType=3 ..} then x.isElementContentWhiteSpace else if x.childNodes is Nil then true else false |
(omitted redundant arrows) |
| 9 | ⬇ let xor_ = fun (x : Any) -> fun (y : Any) -> if and_ (or_ x y) (not_ (and_ x y)) is True then true else false |
|
| 10 | ⬇ (* f, g have type: (Int->Int) & (Any->Bool) *) let example10 = fun (x : Any) -> if (f x, g x) is (Int, Bool) then 1 else 2 |
Warning: line 4, 39-40: unreachable expression |
| 11 | ⬇ let typeof = fun (x:Any) -> if x is Int then "number" else if x is Char then "string" else if x is Bool then "boolean" else "object" let test = fun (x:Any) -> if typeof x is "number" then incr x else if typeof x is "string" then charcode x else if typeof x is "boolean" then int_of_bool x else 0\end{lstlisting} &\vfill\smallskip $(\Int\to\textsf{"number"}) \wedge$\newline $(\Char\to\textsf{"string"})\wedge$\newline $(\Bool\to\textsf{"boolean"})\wedge$\newline $(\lnot(\Bool{\vee}\Int{\vee}\Char)\to\textsf{"object"})\wedge \ldots$\newline (two other redundant cases omitted) \newline~\newline $(\Int \to \Int) \wedge (\Char \to \Int) \wedge (\Bool \to \Int) \wedge $\newline $(\lnot(\Bool{\vee} \Int {\vee} \Char) \to 0)\wedge \ldots$\newline (two other redundant cases omitted) \\\hline 12 & \begin{lstlisting} atom null type Object = Null | { prototype = Object ..} type ObjectWithPropertyL = { l = Any ..} | { prototype = ObjectWithPropertyL ..} let has_property_l = fun (o:Object) -> if o is ObjectWithPropertyL then true else false let has_own_property_l = fun (o:Object) -> if o is { l=Any ..} then true else false let get_property_l = fun (self:Object->Any) o -> if has_own_property_l o is True then o.l else if o is Null then null else self (o.prototype) |
|
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Revisiting Occurrence Typing
Giuseppe Castagna
Institut de Recherche en Informatique Fondamentale (IRIF), CNRS - Université de Paris, France
Victor Lanvin
Mickaël Laurent
Kim Nguyen
Laboratoire de Méthodes Formelles (LMF), CNRS - Université Paris-Saclay, France
Abstract
We revisit occurrence typing, a technique to refine the type of variables occurring in type-cases and, thus, capture some programming patterns used in untyped languages. Although occurrence typing was tied from its inception to set-theoretic types—union types, in particular—it never fully exploited the capabilities of these types. Here we show how, by using set-theoretic types, it is possible to develop a general typing framework that encompasses and generalizes several aspects of current occurrence typing proposals and that can be applied to tackle other problems such as the reconstruction of intersection types for unannotated or partially annotated functions and the optimization of the compilation of gradually typed languages.
keywords:
occurrence typing , type inference , union types , intersection types , TypeScript , Flow language , dynamic languages , type case , gradual typing.
1 Introduction
TypeScript and Flow are extensions of JavaScript that allow the programmer to specify in the code type annotations used to statically type-check the program. For instance, the following function definition is valid in both languages
function foo(x : number | string) { return (typeof(x) === "number")? x+1 : x.trim(); (1) }
Apart from the type annotation (in red) of the function parameter, the above is standard JavaScript code defining a function that checks whether its argument is an integer; if it is so, then it returns the argument’s successor (x+1), otherwise it calls the method trim() of the argument. The annotation specifies that the parameter is either a number or a string (the vertical bar denotes a union type). If this annotation is respected and the function is applied to either an integer or a string, then the application cannot fail because of a type error (trim() is a string method of the ECMAScript 5 standard that trims white-spaces from the beginning and end of the string) and both the type-checker of TypeScript and the one of Flow rightly accept this function. This is possible because both type-checkers implement a specific type discipline called occurrence typing or flow typing:111TypeScript calls it “type guard recognition” while Flow uses the terminology “type refinements”. as a matter of fact, standard type disciplines would reject this function. The reason for that is that standard type disciplines would try to type every part of the body of the function under the assumption that x has type number | string and they would fail, since the successor is not defined for strings and the method trim() is not defined for numbers. This is so because standard disciplines do not take into account the type test performed on x. Occurrence typing is the typing technique that uses the information provided by the test to specialize—precisely, to refine—the type of the occurrences of x in the branches of the conditional: since the program tested that x is of type number, then we can safely assume that x is of type number in the “then” branch, and that it is not of type number (and thus deduce from the type annotation that it must be of type string) in the “else” branch.
Occurrence typing was first defined and formally studied by THF08 to statically type-check untyped Scheme programs,222According to Sam Tobin-Hochstadt, the terminology occurrence typing was first used in a simplistic form by Komon05, although he and Felleisen were not aware of it the at the moment of the writing of [THF08]. and later extended by THF10 yielding the development of Typed Racket. From its inception, occurrence typing was intimately tied to type systems with set-theoretic types: unions, intersections, and negation of types. Union was the first type connective to appear, since it was already used by THF08 where its presence was needed to characterize the different control flows of a type test, as our foo example shows: one flow for integer arguments and another for strings. Intersection types appear (in limited forms) combined with occurrence typing both in TypeScript and in Flow and serve to give, among other, more precise types to functions such as foo. For instance, since x + 1 evaluates to an integer and x.trim() to a string, then our function foo has type (number|string)(number|string). But it is clear that a more precise type would be one that states that foo returns a number when it is applied to a number and returns a string when it is applied to a string, so that the type deduced for, say, foo(42) would be number rather than number|string. This is exactly what the intersection type
[TABLE]
states (intuitively, an expression has an intersection of types, noted &, if and only if it has all the types of the intersection) and corresponds in Flow to declaring foo as follows:
var foo : (number => number) & (string => string) = x => { return (typeof(x) === "number")? x+1 : x.trim(); (3) }
For what concerns negation types, they are pervasive in the occurrence typing approach, even though they are used only at meta-theoretic level,333At the moment of writing there is a pending pull request to add negation types to the syntax of TypeScript, but that is all. in particular to determine the type environment when the type case fails. We already saw negation types at work when we informally typed the “else” branch in foo, for which we assumed that did not have type number—i.e., it had the (negation) type number—and deduced from it that then had type string—i.e., (number|string)&number which is equivalent to the set-theoretic difference (number|string)\ number and, thus, to string.
The approaches cited above essentially focus on refining the type of variables that occur in an expression whose type is being tested. They do it when the variable occurs at top-level in the test (i.e., the variable is the expression being tested) or under some specific positions such as in nested pairs or at the end of a path of selectors. In this work we aim at removing this limitation on the contexts and develop a general theory to refine the type of variables that occur in tested expressions under generic contexts, such as variables occurring in the left or the right expressions of an application. In other words, we aim at establishing a formal framework to extract as much static information as possible from a type test. We leverage our analysis on the presence of full-fledged set-theoretic types connectives provided by the theory of semantic subtyping. Our analysis will also yield two important byproducts. First, to refine the type of the variables we have to refine the type of the expressions they occur in and we can use this information to improve our analysis. Therefore our occurrence typing approach will refine not only the types of variables but also the types of generic expressions–i.e., any expression whatever form it has—bypassing usual type inference. Second, and most importantly, the result of our analysis can be used to infer intersection types for functions, even in the absence of precise type annotations such as the one in the definition of foo in (1): to put it simply, we are able to infer the type (2) for the unannotated pure JavaScript code of foo (i.e., no type annotation at all), while in TypeScript and Flow (and any other formalism we are aware of) this requires an explicit and full type annotation as the one given in (1).
Finally, the natural target for occurrence typing are languages with dynamic type tests, in particular, dynamic languages. To type such languages occurrence typing is often combined not only, as discussed above, with set-theoretic types, but also with extensible record types (to type objects) and gradual type system (to combine static and dynamic typing) two features that we study in Section 3 as two extensions of our core formalism. Of particular interest is the latter. Gre19 singles out occurrence typing and gradual typing as the two “lineages” that partition the research on combining static and dynamic typing: he identifies the former as the “pragmatic, implementation-oriented dynamic-first” lineage and the latter as the “formal, type-theoretic, static-first” lineage. Here we demonstrate that these two “lineages” are not orthogonal or mutually independent, and we combine occurrence and gradual typing showing, in particular, how the former can be used to optimize the compilation of the latter.
1.1 Motivating examples
We focus our study on conditionals that test types and consider the following syntax: (e.g., in this syntax the body of foo in (1) and (1) is rendered as ). In particular, in this introduction we concentrate on applications, since they constitute the most difficult case and many other cases can be reduced to them. A typical example is the expression
[TABLE]
where ’s denote variables, is some type, and ’s are generic expressions. Depending on the actual and on the static types of and , we can make type assumptions for , for , and for the application when typing that are different from those we can make when typing . For instance, suppose is bound to the function foo defined in (1). Thus has type (we used the syntax of the types of Section 2 where unions and intersections are denoted by and and have priority over and , but not over ). Then, it is not hard to see that if , then the expression444This and most of the following expressions are just given for the sake of example. Determining the type in each branch of expressions other than variables is interesting for constructors but less so for destructors such as applications, projections, and selections: any reasonable programmer would not repeat the same application twice, (s)he would store its result in a variable. This becomes meaningful with constructor such as pairs, as we do for instance in the expression in (12).
[TABLE]
is well typed with type Int: when typing the branch “then” we know that the test succeeded and that, therefore, not only is of type Int, but also that is of type Int: the other possibility, , would have made the test fail. For (5) we reasoned only on the type of the variables in the “then” branch but we can do the same on the “else” branch as shown by the following expression, where @ denotes string concatenation
[TABLE]
If the static type of is then is well typed only if the static type of is (a subtype of) and from that it is not hard to deduce that (6) has type . Let us see this in detail. The expression in (6) is typed in the following type environment: . All we can deduce, then, is that the application has type , which is not enough to type either the “then” branch or the “else” branch. In order to type the “then” branch we must be able to deduce that both and are of type Int. Since we are in the “then” branch, then we know that the type test succeeded and that, therefore, has type Int. Thus we can assume in typing this branch that has both its static type and type Int and, thus, their intersection: , that is Int. For what concerns we use the static type of , that is , and notice that this function returns an Int only if its argument is of type Int. Reasoning as above we thus deduce that in the “then” branch the type of is the intersection of its static type with Int: that is Int. To type the “else” branch we reason exactly in the same way, with the only difference that, since the type test has failed, then we know that the type of the tested expression is not Int. That is, the expression can produce any possible value barring an Int. If we denote by the type of all values (i.e., the type any of TypeScript and Flow) and by the set difference, then this means that in the else branch we know that has type —written —, that is, it can return values of any type barred Int. Reasoning as for the “then” branch we then assume that has type (i.e., , that is, String), that must be of type String for the application to have type and therefore we assume that has type (i.e., again String).
We have seen that we can specialize in both branches the type of the whole expression , the type of the argument , but what about the type of the function ? Well, this depends on the type of itself. In particular, if instead of an intersection type is typed by a union type (e.g., when the function bound to is the result of a branching expression), then the test may give us information about the type of the function in the various branches. So for instance if in the expression in (4) is of type, say, , then we can assume for the expression (4) that has type in the branch “then” and in the branch “else”. As a more concrete example, if and is well-typed, then we can deduce for
[TABLE]
the type : in the “then” branch has type and is of type Int; in the “else” branch has type and is of type Bool.
Let us recap. If is an expression of type and we are trying to type , then we can assume that has type when typing and type when typing . If furthermore is of the form , then we may also be able to specialize the types for (in particular if its static type is a union of arrows) and for (in particular if the static type of is an intersection of arrows). Additionally, we can repeat the reasoning for all subterms of and as long as they are applications, and deduce distinct types for all subexpressions of that form applications. How to do it precisely—not only for applications, but also for other terms such as pairs, projections, records etc—is explained in the rest of the paper but the key ideas are pretty simple and are presented next.
1.2 Key ideas
First of all, in a strict language we can consider a type as denoting the set of values of that type and subtyping as set-containment of the denoted values. Imagine we are testing whether the result of an application is of type or not, and suppose we know that the static types of and are and respectively. If the application is well typed, then there is a lot of useful information that we can deduce from it: first, that is a functional type (i.e., it denotes a set of well-typed -abstractions, the values of functional type) whose domain, denoted by , is a type denoting the set of all values that are accepted by any function in ; second that must be a subtype of the domain of ; third, we also know the type of the application, that is the type that denotes all the values that may result from the application of a function in to an argument in , type that we denote by . For instance, if and , then and . Notice that, introducing operations such as and is redundant when working with simple types, but becomes necessary in the presence of set-theoretic types. If for instance is the type of (1), that is, , then , that is the union of all the possible input types, while the precise return type of such a function depends on the type of the argument the function is applied to: either an integer, or a string, or both (i.e., the union type ). So we have , , and (see Section 2.6.1 for the formal definition of ).
What we want to do is to refine the types of and (i.e., and ) for the cases where the test that has type succeeds or fails. Let us start with refining the type of for the case in which the test succeeds. Intuitively, we want to remove from all the values for which the application will surely return a result not in , thus making the test fail. Consider and let be the largest subtype of such that
[TABLE]
In other terms, contains all the legal arguments that make any function in return a result not in . Then we can safely remove from all the values in or, equivalently, keep in all the values of that are not in . Let us implement the second viewpoint: the set of all elements of for which an application does not surely give a result in is denoted (read, “ worra ”) and defined as : it is easy to see that according to this definition is the largest subset of satisfying (8). Then we can refine the type of for when the test is successful by using the type : we intersect all the possible results of , that is , with the elements of the domain that may yield a result in , that is . When the test fails, the type of can be refined in a similar way just by replacing by : we get the refined type . To sum up, to refine the type of an argument in the test of an application, all we need is to define , the set of arguments that when applied to a function of type may return a result in ; then we can refine the type of as t_{2}^{+}\hbox{\;\;=\raise 5.0pt\hbox{\rm\tiny def}\hskip 4.2679pt}t_{2}\wedge(t_{1}\mathop{\,\sqdot\,}t) in the “then” branch (we call it the positive branch) and as t_{2}^{-}\hbox{\;\;=\raise 5.0pt\hbox{\rm\tiny def}\hskip 4.2679pt}t_{2}\setminus(t_{1}\mathop{\,\sqdot\,}t) in the “else” branch (we call it the negative branch). As a side remark note††margin: that the set is different from the set of elements that return a result in (though it is a supertype of it). To see that, consider for the type String and for the type , that is, the type of functions that when applied to a Boolean return a Boolean and when applied to an integer return either an integer or a string; then we have that and , but there is no (non-empty) type that ensures that an application of a function in will surely yield a String result.
Once we have determined , it is then not very difficult to refine the type for the positive branch, too. If the test succeeded, then we know two facts: first, that the function was applied to a value in and, second, that the application did not diverge and returned a result in . Therefore, we can exclude from all the functions that, when applied to an argument in , yield a result not in . It can be obtained simply by removing from the functions in , that is, we refine the type of in the “then” branch as . Note that this also removes functions diverging on arguments. In particular, the interpretation of a type is the set of all functions that when applied to an argument of type either diverge or return a value in . As such the interpretation of contains all the functions that diverge (at least) on . Therefore removing from a type removes from not only all the functions that when applied to a argument return a result in , but also all the functions that diverge on . Ergo removes, among others, all functions in that diverge on . Let us see all this on our example (7), in particular, by showing how this technique deduces that the type of in the positive branch is (a subtype of) . Take the static type of , that is and intersect it with , that is, . Since intersection distributes over unions we obtain
[TABLE]
and since is empty (because contains ), then what we obtain is the left summand, a strict subtype of , namely the functions of type minus those that diverge on all String arguments.
This is essentially what we formalize in Section 2, in the type system by the rule [PAppL] and in the typing algorithm with the case (20) of the definition of the function Constr.
1.3 Technical challenges
In the previous section we outlined the main ideas of our approach to occurrence typing. However, the devil is in the details. So the formalization we give in Section 2 is not so smooth as we just outlined: we must introduce several auxiliary definitions to handle some corner cases. This section presents by tiny examples the main technical difficulties we had to overcome and the definitions we introduced to handle them. As such it provides a kind of road-map for the technicalities of Section 2.
Typing occurrences
As it should be clear by now, not only variables but also generic expressions are given different types in the “then” and “else” branches of type tests. For instance, in (6) the expression has type Int in the positive branch and type Bool in the negative one. In this specific case it is possible to deduce these typings from the refined types of the variables (in particular, thanks to the fact that has type Int the positive branch and Bool in the negative one), but this is not possible in general. For instance, consider , , and the expression
[TABLE]
It is not possible to specialize the type of the variables in the branches. Nevertheless, we want to be able to deduce that has type Int in the positive branch and type Bool in the negative one. In order to do so in Section 2 we will use special type environments that map not only variables but also generic expressions to types. So to type, say, the positive branch of (9) we extend the current type environment with the hypothesis that the expression has type Int.
When we test the type of an expression we try to deduce the type of some subexpressions occurring in it. Therefore we must cope with subexpressions occurring multiple times. A simple example is given by using product types and pairs as in . It is easy to see that the positive branch is selected only if has type and type and deduce from that that must be typed in by their intersection, . To deal with multiple occurrences of a same subexpression the type inference system of Section 2 will use the classic rule for introducing intersections [Inter], while the algorithmic counterpart will use the operator that intersects the static type of an expression with all the types deduced for the multiple occurrences of it.
Type preservation
We want our type system to be sound in the sense of Wright1994, that is, that it satisfies progress and type preservation. The latter property is challenging because, as explained just above, our type assumptions are not only about variables but also about expressions. Two corner cases are particularly difficult. The first is shown by the following example
[TABLE]
If is an expression of type , then, as discussed before, the positive branch will have type . If furthermore the negative branch is of the same type (or of a subtype), then this will also be the type of the whole expression in (10). Now imagine that the application reduces to a Boolean value, then the whole expression in (10) reduces to ; but this has type which, in general, is not a subtype of , and therefore type is not preserved by the reduction. To cope with this problem, the proof of type preservation (see LABEL:app:subject-reduction) resorts to type schemes, a technique introduced by Frisch2008 to type expressions by sets of types, so that the expression in (10) will have both the types at issue.
The second corner case is a modification of the example above where the positive branch is , e.g., . In this case the type deduced for the whole expression is Bool, while after reduction we would obtain the expression which is not of type Bool but of type (even though it will eventually reduce to a Bool). This problem will be handled in the proof of type preservation by considering parallel reductions (e.g, if reduces in a step to, say, false, then reduces in one step to ): see LABEL:app:parallel.
Interdependence of checks
The last class of technical problems arise from the mutual dependence of different type checks. In particular, there are two cases that pose a problem. The first can be shown by two functions and both of type , of type and the test:
[TABLE]
If we independently check against Int and against Bool we deduce Int for the first occurrence of and for the second. Thus we would type the positive branch of (11) under the hypothesis that is of type Int. But if we use the hypothesis generated by the test of , that is, that is of type Int, to check against Bool, then the type deduced for is —i.e., the branch is never selected. In other words, we want to produce type environments for occurrence typing by taking into account all the available hypotheses, even when these hypotheses are formulated later in the flow of control. This will be done in the type systems of Section 2 by the rule [Path] and will require at algorithmic level to look for a fix-point solution of a function, or an approximation thereof.
Finally, a nested check may help refining the type assumptions on some outer expressions. For instance, when typing the positive branch of
[TABLE]
we can assume that the expression is of type and put it in the type environment. But if in there is a test like \texttt{(}x\in\text{{Int}}\texttt{)?}{\color[rgb]{0.8,0.2,0.2}(x,y)}\texttt{:}(...) then we do not want use the assumption in the type environment to type the expression occurring in the inner test (in red). Instead we want to give to that occurrence of the expression the type . This will be done by temporarily removing the type assumption about from the type environment and by retyping the expression without that assumption (see rule [EnvA] in Section 2.6.3).
Outline
In Section 2 we formalize the ideas we just presented: we define the types and expressions of our system, their dynamic semantics and a type system that implements occurrence typing together with the algorithms that decide whether an expression is well typed or not. Section 3 extends our formalism to record types and presents two applications of our analysis: the inference of arrow types for functions and a static analysis to reduce the number of casts inserted by a compiler of a gradually-typed language. Practical aspects are discussed in Section 4 where we give several paradigmatic examples of code typed by our prototype implementation, that can be interactively tested at https://occtyping.github.io/. Section LABEL:sec:related presents related work. A discussion of future work concludes this presentation. To ease the presentation all the proofs are omitted from the main text and can be found in the appendix.
Contributions
The main contributions of our work can be summarized as follows:
- •
We provide a theoretical framework to refine the type of expressions occurring in type tests, thus removing the limitations of current occurrence typing approaches which require both the tests and the refinement to take place on variables.
- •
We define a type-theoretic approach alternative to the current flow-based approaches. As such it provides different results and it can be thus profitably combined with flow-based techniques.
- •
We use our analysis for defining a formal framework that reconstructs intersection types for unannotated or partially-annotated functions, something that, in our ken, no other current system can do.
- •
We prove the soundness of our system. We define algorithms to infer the types that we prove to be sound and show different completeness results which in practice yield the completeness of any reasonable implementation.
- •
We show how to extend our approach to records with field addition, update, and deletion operations.
- •
We show how occurrence typing can be extended to and combined with gradual typing and apply our results to optimize the compilation of the latter.
We end this introduction by stressing the practical implications of our work: a perfunctory inspection may give the wrong impression that the only interest of the heavy formalization that follows is to have generic expressions, rather than just variables, in type cases: this would be a bad trade-off. The important point is, instead, that our formalization is what makes analyses such as those presented in Section 3 possible (e.g., the reconstruction of the type (2) for the unannotated pure JavaScript code of foo), which is where the actual added practical value and potential of our work resides.
2 Language
In this section we formalize the ideas we outlined in the introduction. We start by the definition of types followed by the language and its reduction semantics. The static semantics is the core of our work: we first present a declarative type system that deduces (possibly many) types for well-typed expressions and then the algorithms to decide whether an expression is well typed or not.
2.1 Types
Definition 2.1** (Types).**
The set of types Types is formed by the terms coinductively produced by the grammar:
[TABLE]
and that satisfy the following conditions
- •
(regularity) every term has a finite number of different sub-terms;
- •
(contractivity) every infinite branch of a term contains an infinite number of occurrences of the arrow or product type constructors.
We use the following abbreviations: t_{1}\land t_{2}\hbox{\;\;=\raise 5.0pt\hbox{\rm\scriptsize def}\hskip 2.84526pt}\neg(\neg t_{1}\vee\neg t_{2}), t_{1}\setminus t_{2}\hbox{\;\;=\raise 5.0pt\hbox{\rm\scriptsize def}\hskip 2.84526pt}t_{1}\wedge\neg t_{2}, \MyMathBb{1}\hbox{\;\;=\raise 5.0pt\hbox{\rm\scriptsize def}\hskip 2.84526pt}\neg\MyMathBb{0}. ranges over basic types (e.g., Int, Bool), and respectively denote the empty (that types no value) and top (that types all values) types. Coinduction accounts for recursive types and the condition on infinite branches bars out ill-formed types such as (which does not carry any information about the set denoted by the type) or (which cannot represent any set). It also ensures that the binary relation defined by , , is Noetherian. This gives an induction principle on Types that we will use without any further explicit reference to the relation.555In a nutshell, we can do proofs by induction on the structure of unions and negations—and, thus, intersections—but arrows, products, and basic types are the base cases for the induction. We refer to , , and as type constructors and to , , , and as type connectives.
The subtyping relation for these types, noted , is the one defined by Frisch2008 and detailed description of the algorithm to decide this relation can be found in [Cas15]. For the reader’s convenience we succinctly recall the definition of the subtyping relation in the next subsection but it is possible to skip this subsection at first reading and jump directly to Subsection 2.3, since to understand the rest of the paper it suffices to consider that types are interpreted as sets of values (i.e., either constants, -abstractions, or pairs of values: see Section 2.3 right below) that have that type, and that subtyping is set containment (i.e., a type is a subtype of a type if and only if contains all the values of type ). In particular, contains all -abstractions that when applied to a value of type , if their computation terminates, then they return a result of type (e.g., is the set of all functions666Actually, for every type , all types of the form are equivalent and each of them denotes the set of all functions. and is the set of functions that diverge on every argument). Type connectives (i.e., union, intersection, negation) are interpreted as the corresponding set-theoretic operators (e.g., is the union of the values of the two types). We use to denote the symmetric closure of : thus (read, is equivalent to ) means that and denote the same set of values and, as such, they are semantically the same type. All the above is formalized as follows.
2.2 Subtyping
Subtyping is defined by giving a set-theoretic interpretation of the types of Definition 2.1 into a suitable domain :
Definition 2.2** (Interpretation domain [Frisch2008]).**
The interpretation domain is the set of finite terms produced inductively by the following grammar
[TABLE]
where ranges over the set of constants and where is such that .
The elements of correspond, intuitively, to (denotations of) the results of the evaluation of expressions. In particular, in a higher-order language, the results of computations can be functions which, in this model, are represented by sets of finite relations of the form , where (which is not in ) can appear in second components to signify that the function fails (i.e., evaluation is stuck) on the corresponding input. This is implemented by using in the second projection the meta-variable which ranges over (we reserve to range over , thus excluding ). This constant is used to ensure that is not a supertype of all function types: if we used instead of , then every well-typed function could be subsumed to and, therefore, every application could be given the type , independently from its argument as long as this argument is typable (see Section 4.2 of [Frisch2008] for details). The restriction to finite relations corresponds to the intuition that the denotational semantics of a function is given by the set of its finite approximations, where finiteness is a restriction necessary (for cardinality reasons) to give the semantics to higher-order functions.
We define the interpretation of a type so that it satisfies the following equalities, where denotes the restriction of the powerset to finite subsets and denotes the function that assigns to each basic type the set of constants of that type, so that for every constant we have (we use to denote the basic type of the constant ):
[TABLE]
We cannot take the equations above directly as an inductive definition of because types are not defined inductively but coinductively. However, recall that the contractivity condition of Definition 2.1 ensures that the binary relation defined by , , is Noetherian which gives an induction principle on Types that we use combined with structural induction on to give the following definition which validates these equalities.
Definition 2.3** (Set-theoretic interpretation of types [Frisch2008]).**
We define a binary predicate (“the element belongs to the type ”), where and , by induction on the pair ordered lexicographically. The predicate is defined as follows:
[TABLE]
We define the set-theoretic interpretation as .
Finally, we define the subtyping preorder and its associated equivalence relation as follows.
Definition 2.4** (Subtyping relation [Frisch2008]).**
We define the subtyping relation and the subtyping equivalence relation as t_{1}\leq t_{2}\hbox{\;\;\iff\raise 5.0pt\hbox{\rm\scriptsize def}\hskip 11.38109pt}\llbracket t_{1}\rrbracket\subseteq\llbracket t_{2}\rrbracket and t_{1}\simeq t_{2}\hbox{\;\;\iff\raise 5.0pt\hbox{\rm\scriptsize def}\hskip 11.38109pt}(t_{1}\leq t_{2})\mathrel{\mathsf{and}}(t_{2}\leq t_{1})\>.
2.3 Syntax
The expressions and values of our language are inductively generated by the following grammars:
[TABLE]
for . In (13), ranges over constants (e.g., true, false, 1, 2, …) which are values of basic types; ranges over variables; denotes pairs and their projections; denotes the type-case expression that evaluates either or according to whether the value returned by (if any) has the type or not; denotes the function of parameter and body annotated with the type . An expression has an intersection type if and only if it has all the types that compose the intersection. Therefore, intuitively, is a well-typed expression if for all the hypothesis that is of type implies that the body has type , that is to say, it is well typed if has type for all .
2.4 Dynamic semantics
The dynamic semantics is defined as a classic left-to-right call-by-value weak reduction for a -calculus with pairs, enriched with specific rules for type-cases. We have the following notions of reduction:
[TABLE]
where denotes, intuitively, the set of values that have type . Formally, where is inductively defined as: \textsf{{typeof}}_{\mathcal{V}}(c)\hbox{\;\;=\raise 5.0pt\hbox{\rm\scriptsize def}\hskip 2.84526pt}\{\text{b}_{c}\}, \textsf{{typeof}}_{\mathcal{V}}(\lambda^{\wedge_{i\in I}s_{i}\to t_{i}}x.e)\hbox{\;\;=\raise 5.0pt\hbox{\rm\scriptsize def}\hskip 2.84526pt}\{t~{}|~{}t\simeq(\wedge_{i\in I}s_{i}\to t_{i})\wedge(\wedge_{j\in J}s_{j}^{\prime}\to t_{j}^{\prime}),t\not\leq\MyMathBb{0}\}, \textsf{{typeof}}_{\mathcal{V}}((v_{1},v_{2}))\hbox{\;\;=\raise 5.0pt\hbox{\rm\scriptsize def}\hskip 2.84526pt}\textsf{{typeof}}_{\mathcal{V}}(v_{1})\times\textsf{{typeof}}_{\mathcal{V}}(v_{2}) 777This definition may look complicated but it is necessary to handle some corner cases for negated arrow types (cf. rule [Abs-] in Section 2.5). For instance, it states that ..
Contextual reductions are defined by the following evaluation contexts:
[TABLE]
As usual we denote by the term obtained by replacing for the hole in the context and we have that implies .
2.5 Static semantics
While the syntax and reduction semantics are, on the whole, pretty standard, for what concerns the type system we will have to introduce several unconventional features that we anticipated in Section 1.3 and are at the core of our work. Let us start with the standard part, that is the typing of the functional core and the use of subtyping, given by the following typing rules:
[TABLE]
[TABLE]
These rules are quite standard and do not need any particular explanation besides those already given in Section 2.3. Just notice subtyping is embedded in the system by the classic [Subs] subsumption rule. Next we focus on the unconventional aspects of our system, from the simplest to the hardest.
The first unconventional aspect is that, as explained in Section 1.3, our type assumptions are about expressions. Therefore, in our rules the type environments, ranged over by , map expressions—rather than just variables—into types. This explains why the classic typing rule for variables is replaced by a more general [Env] rule defined below:
[TABLE]
The [Env] rule is coupled with the standard intersection introduction rule [Inter] which allows us to deduce for a complex expression the intersection of the types recorded by the occurrence typing analysis in the environment with the static type deduced for the same expression by using the other typing rules. This same intersection rule is also used to infer the second unconventional aspect of our system, that is, the fact that -abstractions can have negated arrow types, as long as these negated types do not make the type deduced for the function empty:
[TABLE]
In Section 1.3 we explained that in order for our system to satisfy the property of type preservation, the type system must be able to deduce negated arrow types for functions—e.g. the type for . We demonstrated this with the expression in equation (10), for which type preservation holds only if we are able to deduce for this expression the type , that is, . But the sole rule [Abs+] above does not allow us to deduce negations of arrows for -abstractions: the rule [Abs-] makes this possible. This rule ensures that given a function (where is an intersection type), for every type , either can be obtained by subsumption from or can be added to the intersection . In turn this ensures that, for any function and any type either the function has type or it has type (see Pet19phd for a thorough discussion on this rule). As an aside, note that this kind of deduction is already present in the system by Frisch2008 though in that system this presence was motivated by the semantics of types rather than, as in our case, by the soundness of the type system.
Rules [Abs+] and [Abs-] are not enough to deduce for -abstractions all the types we wish. In particular, these rules alone are not enough to type general overloaded functions. For instance, consider this simple example of a function that applied to an integer returns its successor and applied to anything else returns true:
Clearly, the expression above is well typed, but the rule [Abs+] alone is not enough to type it. In particular, according to [Abs+] we have to prove that under the hypothesis that is of type Int the expression is of type Int, too. That is, that under the hypothesis that has type (we apply occurrence typing) the expression is of type Int (which holds) and that under the hypothesis that has type , that is (we apply once more occurrence typing), true is of type Int (which does not hold). The problem is that we are trying to type the second case of a type-case even if we know that there is no chance that, when is bound to an integer, that case will be ever selected. The fact that it is never selected is witnessed by the presence of a type hypothesis with type. To avoid this problem (and type the term above) we add the rule [Efq] (ex falso quodlibet) that allows the system to deduce any type for an expression that will never be selected, that is, for an expression whose type environment contains an empty assumption:
[TABLE]
Once more, this kind of deduction was already present in the system by Frisch2008 to type full fledged overloaded functions, though it was embedded in the typing rule for the type-case. Here we need the rule [Efq], which is more general, to ensure the property of subject reduction.
Finally, there remains one last rule in our type system, the one that implements occurrence typing, that is, the rule for the type-case:
[TABLE]
The rule [Case] checks whether the expression , whose type is being tested, is well-typed and then performs the occurrence typing analysis that produces the environments ’s under whose hypothesis the expressions ’s are typed. The production of these environments is represented by the judgments . The intuition is that when is provable then is a version of extended with type hypotheses for all expressions occurring in , type hypotheses that can be deduced assuming that the test succeeds. Likewise, (notice the negation on ) extends with the hypothesis deduced assuming that , that is, for when the test fails.
All it remains to do is to show how to deduce judgments of the form . For that we first define how to denote occurrences of an expression. These are identified by paths in the syntax tree of the expressions, that is, by possibly empty strings of characters denoting directions starting from the root of the tree (we use for the empty string/path, which corresponds to the root of the tree).
Let be an expression and a path; we denote the occurrence of reached by the path , that is (for , and undefined otherwise)
[TABLE]
To ease our analysis we used different directions for each kind of term. So we have [math] and for the function and argument of an application, and for the eft and ight expressions forming a pair, and and for the argument of a irst or of a econd projection. Note also that we do not consider occurrences under ’s (since their type is frozen in their annotations) and type-cases (since they reset the analysis). The judgments are then deduced by the following two rules:
[TABLE]
These rules describe how to produce by occurrence typing the type environments while checking that an expression has type . They state that we can deduce from all the hypothesis already in (rule [Base]) and that if we can deduce a given type for a particular occurrence of the expression being checked, then we can add this hypothesis to the produced type environment (rule [Path]). The rule [Path] uses a (last) auxiliary judgement to deduce the type of the occurrence when checking against under the hypotheses . This rule [Path] is subtler than it may appear at first sight, insofar as the deduction of the type for may already use some hypothesis on (in ) and, from an algorithmic viewpoint, this will imply the computation of a fix-point (see Section 2.6.2). The last ingredient for our type system is the deduction of the judgements of the form where is a path to an expression occurring in . This is given by the following set of rules.
[TABLE]
[TABLE]
These rules implement the analysis described in Section 1.2 for functions and extend it to products. Let us comment each rule in detail. [PSubs] is just subsumption for the deduction . The rule [PInter] combined with [PTypeof] allows the system to deduce for an occurrence the intersection of the static type of (deduced by [PTypeof]) with the type deduced for by the other rules. The rule [PEps] is the starting point of the analysis: if we are assuming that the test succeeds, then we can assume that (i.e., ) has type (recall that assuming that the test fails corresponds to having at the index of the turnstyle). The rule [PAppR] implements occurrence typing for the arguments of applications, since it states that if a function maps arguments of type in results of type and an application of this function yields results (in ) that cannot be in (since ), then the argument of this application cannot be of type . [PAppL] performs the occurrence typing analysis for the function part of an application, since it states that if an application has type and the argument of this application has type , then the function in this application cannot have type . Rules [PPair_] are straightforward since they state that the -th projection of a pair that is of type must be of type . So are the last two rules that essentially state that if (respectively, ) is of type , then the type of must be of the form (respectively, ).
This concludes the presentation of all the rules of our type system (they are summarized for the reader’s convenience in LABEL:sec:declarative), which satisfies the property of safety, deduced, as customary, from the properties of progress and subject reduction (cf. LABEL:app:soundness).
Theorem 2.5** (type safety).**
For every expression such that either diverges or there exists a value of type such that .
2.6 Algorithmic system
The type system we defined in the previous section implements the ideas we illustrated in the introduction and it is safe. Now the problem is to decide whether an expression is well typed or not, that is, to find an algorithm that given a type environment and an expression decides whether there exists a type such that is provable. For that we need to solve essentially two problems: how to handle the fact that it is possible to deduce several types for the same well-typed expression and how to compute the auxiliary deduction system for paths.
. Multiple types have two distinct origins each requiring a distinct technical solution. The first origin is the presence of structural rules888In logic, logical rules refer to a particular connective (here, a type constructor, that is, either , or , or ), while identity rules (e.g., axioms and cuts) and structural rules (e.g., weakening and contraction) do not.
such as [Subs] and [Inter]. We handle this presence in the classic way: we define an algorithmic system that tracks the minimum type of an expression; this system is obtained from the original system by removing the two structural rules and by distributing suitable checks of the subtyping relation in the remaining rules. To do that in the presence of set-theoretic types we need to define some operators on types, which are given in Section 2.6.1. The second origin is the rule [Abs-] by which it is possible to deduce for every well-typed lambda abstraction infinitely many types, that is the annotation of the function intersected with as (finitely) many negations of arrow types as possible without making the type empty. We do not handle this multiplicity directly in the algorithmic system but only in the proof of its soundness by using and adapting the technique of type schemes defined by Frisch2008. Type schemes are canonical representations of the infinite sets of types of -abstractions which can be used to define an algorithmic system that can be easily proved to be sound. The simpler algorithm that we propose in this section implies (i.e., it is less precise than) the one with type schemes (cf. Lemma B.20) and it is thus sound, too. The algorithm of this section is not only simpler but, as we discuss in Section 2.6.4, is also the one that should be used in practice. This is why we preferred to present it here and relegate the presentation of the system with type schemes to B.2.1.
. For what concerns the use of the auxiliary derivation for the and judgments, we present in Section 2.6.2 an algorithm that is sound and satisfies a limited form of completeness. All these notions are then used in the algorithmic typing system given in Section 2.6.3.
2.6.1 Operators for type constructors
In order to define the algorithmic typing of expressions like applications and projections we need to define the operators on types we used in Section 1.2. Consider the classic rule [App] for applications. It essentially does three things: it checks that the expression in the function position has a functional type; it checks that the argument is in the domain of the function, and it returns the type of the application. In systems without set-theoretic types these operations are quite straightforward: corresponds to checking that the expression has an arrow type, corresponds to checking that the argument is in the domain of the arrow deduced for the function, and corresponds to returning the codomain of that same arrow. With set-theoretic types things get more difficult, since a function can be typed by, say, a union of intersection of arrows and negations of types. Checking that the function has a functional type is easy since it corresponds to checking that it has a type subtype of . Determining its domain and the type of the application is more complicated and needs the operators and we informally described in Section 1.2 where we also introduced the operator . These three operators are used by our algorithm and formally defined as:
[TABLE]
In short, is the largest domain of any single arrow that subsumes , is the smallest codomain of an arrow type that subsumes and has domain and was explained before.
We need similar operators for projections since the type of in may not be a single product type but, say, a union of products: all we know is that must be a subtype of . So let be a type such that , then we define:
[TABLE]
All the operators above but are already present in the theory of semantic subtyping: the reader can find how to compute them in [Frisch2008, Section 6.11] (see also [Cas15, §4.4] for a detailed description). Below we just show our new formula that computes for a subtype of . For that, we use a result of semantic subtyping that states that every type is equivalent to a type in disjunctive normal form and that if furthermore , then with for all in . For such a and any type then we have:
[TABLE]
The formula considers only the positive arrows of each summand that forms and states that, for each summand, whenever you take a subset of its positive arrows that cannot yield results in (since does not overlap the intersection of the codomains of these arrows), then the success of the test cannot depend on these arrows and therefore the intersection of the domains of these arrows—i.e., the values that would precisely select that set of arrows—can be removed from . The proof that this type satisfies (16) is given in the LABEL:app:worra.
2.6.2 Type environments for occurrence typing
The second ingredient necessary to the definition of our algorithmic systems is the algorithm for the deduction of , that is an algorithm that takes as input , , and , and returns an environment that extends with hypotheses on the occurrences of that are the most general that can be deduced by assuming that succeeds. For that we need the notation which denotes the type deduced for under the type environment in the algorithmic type system of Section 2.6.3. That is, if and only if is provable.
We start by defining the algorithm for each single occurrence, that is for the deduction of . This is obtained by defining two mutually recursive functions Constr and Intertype:
[TABLE]
All the functions above are defined if and only if the initial path is valid for (i.e., is defined) and is well-typed (which implies that all in the definition are defined).999Note that the definition is well-founded. This can be seen by analyzing the rule [CaseA] of Section 2.6.3: the definition of and use , and this is defined for all since the first premisses of [CaseA] states that (and this is possible only if we were able to deduce under the hypothesis the type of every occurrence of .)
Each case of the definition of the Constr function corresponds to the application of a logical rule (cf. definition in Footnote 8) in the deduction system for : case (19) corresponds to the application of [PEps]; case (20) implements [Pappl] straightforwardly; the implementation of rule [PAppR] is subtler: instead of finding the best to subtract (by intersection) from the static type of the argument, (21) finds directly the best type for the argument by applying the operator to the static type of the function and the refined type of the application. The remaining (22–25) cases are the straightforward implementations of the rules [PPairL], [PPairR], [PFst], and [PSnd], respectively.
The other recursive function, Intertype, implements the two structural rules [PInter] and [PTypeof] by intersecting the type obtained for by the logical rules, with the static type deduced by the type system for the expression occurring at . The remaining structural rule, [Psubs], is accounted for by the use of the operators and in the definition of Constr.
It remains to explain how to compute the environment produced from by the deduction system for . Alas, this is the most delicate part of our algorithm. In a nutshell, what we want to do is to define a function that takes a type environment , an expression and a type and returns the best type environment such that holds. By the best environment we mean the one in which the occurrences of are associated to the largest possible types (type environments are hypotheses so they are contravariant: the larger the type the better the hypothesis). Recall that in Section 1.3 we said that we want our analysis to be able to capture all the information available from nested checks. If we gave up such a kind of precision then the definition of Refine would be pretty easy: it must map each subexpression of to the intersection of the types deduced by (i.e., by Intertype) for each of its occurrences. That is, for each expression occurring in , would be the type environment that maps into . As we explained in Section 1.3 the intersection is needed to apply occurrence typing to expressions such as where some expressions—here —occur multiple times.
In order to capture most of the type information from nested queries the rule [Path] allows the deduction of the type of some occurrence to use a type environment that may contain information about some suboccurrences of . On the algorithm this would correspond to applying the Refine defined above to an environment that already is the result of Refine, and so on. Therefore, ideally our algorithm should compute the type environment as a fixpoint of the function . Unfortunately, an iteration of Refine may not converge. As an example, consider the (dumb) expression . If , then when refining the “then” branch, every iteration of Refine yields for a type strictly more precise than the type deduced in the previous iteration (because of the case).
The solution we adopt in practice is to bound the number of iterations to some number . This is obtained by the following definition of Refine
[TABLE]
Note in particular that extends with hypotheses on the expressions occurring in , since .
In other terms, we try to find a fixpoint of but we bound our search to iterations. Since is monotone (w.r.t. the subtyping pre-order extended to type environments pointwise), then every iteration yields a better solution. While this is unsatisfactory from a formal point of view, in practice the problem is a very mild one. Divergence may happen only when refining the type of a function in an application: not only such a refinement is meaningful only when the function is typed by a union type, but also we had to build the expression that causes the divergence in quite an ad hoc way which makes divergence even more unlikely: setting an twice the depth of the syntax tree of the outermost type case should be more than enough to capture all realistic cases. For instance, all examples given in Section 4 can be checked (or found to be ill-typed) with .
2.6.3 Algorithmic typing rules
We now have all the definitions we need for our typing algorithm, which is defined by the following rules.
[TABLE]
[TABLE]
The side conditions of the rules ensure that the system is syntax directed, that is, that at most one rule applies when typing a term: priority is given to [EqfA] over all the other rules and to [EnvA] over all remaining logical rules. The subsumption rule is no longer in the system; it is replaced by: using a union type in [CaseA], checking in [AbsA] that the body of the function is typed by a subtype of the type declared in the annotation, and using type operators and checking subtyping in the elimination rules [AppA,ProjA]. In particular, for [AppA] notice that it checks that the type of the function is a functional type, that the type of the argument is a subtype of the domain of the function, and then returns the result type of the application of the two types. The intersection rule is (partially) replaced by the rule [EnvA] which intersects the type deduced for an expression by occurrence typing and stored in with the type deduced for by the logical rules: this is simply obtained by removing any hypothesis about from , so that the deduction of the type for cannot but end by a logical rule. Of course, this does not apply when the expression is a variable, since an hypothesis in is the only way to deduce the type of a variable, which is why the algorithm reintroduces the classic rule for variables. Finally, notice that there is no counterpart for the rule [Abs-] and that therefore it is not possible to deduce negated arrow types for functions. This means that the algorithmic system is not complete as we discuss in details in the next section.
2.6.4 Properties of the algorithmic system
In what follow we will use to stress the fact that the judgment is provable in the algorithmic system where is defined as ; we will omit the index —thus keeping it implicit—whenever it does not matter in the context.
The algorithmic system above is sound with respect to the deductive one of Section 2.5
Theorem 2.6** (Soundness).**
For every , , , , if , then .
The proof of this theorem (see B.5) is obtained by defining an algorithmic system that uses type schemes, that is, which associates each typable term with a possibly infinite set of types (in particular a -expression will be associated to a set of types of the form ) and proving that, if then with : the soundness of follows from the soundness of .
Completeness needs a more detailed explanation. The algorithmic system is not complete w.r.t. the language presented in Section 2.3 because it cannot deduce negated arrow types for functions. However, no practical programming language with structural subtyping would implement the full language of Section 2.3, but rather restrict all expressions of the form so that the type tested in them is either non functional (e.g., products, integer, a record type, etc.) or it is (i.e., the expression can just test whether returns a function or not).101010Of course, there exist languages in which it is possible to check whether some value has a type that has functional subcomponents—e.g., to test whether an object is of some class that possesses some given methods, but that is a case of nominal rather than structural subtyping, which in our framework corresponds to testing whether a value has some basic type.
There are multiple reasons to impose such a restriction, the most important ones can be summarized as follows:
For explicitly-typed languages it may yield conterintutive results, since for instance should fail despite the fact that identity functions maps Booleans to Booleans. 2. 2.
For implicitly-typed languages it yields a semantics that depends on the inference algorithm, since may either fail or not according to whether the type deduced for the result of the expression is either or (which are both valid but incomparable). 3. 3.
For gradually-typed languages it would yield a problematic system as we explain in Section 3.3.
Now, if we apply this restriction to the language of Section 2.3, then the algorithmic system of section 2.6.3 is complete. Let say that an expression is positive if it never tests a functional type more precise than (see B.5 for the formal definition). Then we have:
Theorem 2.7** (Completeness for Positive Expressions).**
For every type environment and positive expression , if , then there exist and such that .
We can use the algorithmic system defined for the proof of Theorem 2.6 to give a far more precise characterization than the above of the terms for which our algorithm is complete: positivity is a practical but rough approximation. The system copes with negated arrow types, but it still is not complete essentially for two reasons: the recursive nature of rule [Path] and the use of nested [PAppL] that yields a precision that the algorithm loses by using type schemes in defining of Constr (case (20) is the critical one). Completeness is recovered by limiting the depth of the derivations and forbidding nested negated arrows on the left-hand side of negated arrows.
Definition 2.8** (Rank-0 negation).**
A derivation of is rank-0 negated if [Abs–] never occurs in the derivation of a left premise of a [PAppL] rule.
The use of this terminology is borrowed from the ranking of higher-order types, since, intuitively, it corresponds to typing a language in which in the types used in dynamic tests, a negated arrow never occurs on the left-hand side of another negated arrow.
Theorem 2.9** (Rank-0 Completeness).**
For every , , , if is derivable by a rank-0 negated derivation, then there exists such that and .
This last result is only of theoretical interest since, in practice, we expect to have only languages with positive expressions. This is why for our implementation we use the library of CDuce [BCF03] in which type schemes are absent and functions are typed only by intersections of positive arrows. We present the implementation in Section 4, but before we study some extensions.
3 Extensions
As we recalled in the introduction, the main application of occurrence typing is to type dynamic languages. In this section we explore how to extend our work to encompass three features that are necessary to type these languages.
First, we consider record types and record expressions which, in dynamic languages, are used to implement objects. In particular, we extend our system to cope with typical usage patterns of objects employed in these languages such as adding, modifying, or deleting a field, or dynamically testing its presence to specify different behaviors.
Second, in order to precisely type applications in dynamic languages it is crucial to refine the type of some functions to account for their different behaviors with specific input types. But current approaches are bad at it: they require the programmer to explicitly specify a precise intersection type for these functions and, even with such specifications, some common cases fail to type (in that case the only solution is to hard-code the function and its typing discipline into the language). We show how we can use the work developed in the previous sections to infer precise intersection types for functions. In our system, these functions do not require any type annotation or just an annotation for the function parameters, whereas some of them fail to type in current alternative approaches even when they are given the full intersection type specification.
Finally, to type dynamic languages it is often necessary to make statically-typed parts of a program coexist with dynamically-typed ones. This is the aim of gradually typed systems that we explore in the third extension of this section.
3.1 Record types
The previous analysis already covers a large gamut of realistic cases. For instance, the analysis already handles list data structures, since products and recursive types can encode them as right-associative nested pairs, as it is done in the language CDuce (e.g., is the type of the lists of integers): see Code 8 in Table 4.2 of Section 4 for a concrete example. Even more, thanks to the presence of union types it is possible to type heterogeneous lists whose content is described by regular expressions on types as proposed by hosoya00regular. However, this is not enough to cover records and, in particular, the specific usage patterns in dynamic languages of records, whose field are dynamically tested, deleted, added, and modified. This is why we extend here our work to records, building on the record types as they are defined in CDuce.
The extension we present in this section is not trivial. Although we use the record types as they are defined in CDuce we cannot do the same for CDuce record expressions. The reasons why we cannot use the record expressions of CDuce and we have to define and study new ones are twofold. On the one hand we want to capture the typing of record field extension and field deletion, two operation widely used in dynamic language; on the other hand we need to have very simple expressions formed by elementary sub-expressions, in order to limit the combinatorics of occurrence typing. For this reason we build our records one field at a time, starting from the empty record and adding, updating, or deleting single fields.
Formally, CDuce record types can be embedded in our types by adding the following two type constructors:
where ranges over an infinite set of labels Labels and Undef is a special singleton type whose only value is a constant undef which is not in (for that it is a constant akin to ): as a consequence Undef and are distinct types, the interpretation of the former being the constant undef while the interpretation of the latter being the set of all the other values. The type is a quasi-constant function that maps every to the type and every other to the type (all the ’s must be distinct). Quasi constant functions are the internal representation of record types in CDuce. These are not visible to the programmer who can use only two specific forms of quasi constant functions, open record types and closed record types (as for OCaml object types), provided by the following syntactic sugar:111111Note that in the definitions “” is meta-syntax to denote the presence of other fields while in the open records “..” is the syntax that distinguishes them from closed ones.
- •
for (closed records).
- •
for (open records).
plus the notation to denote optional fields, which corresponds to using in the quasi-constant function notation the field .
For what concerns expressions, we cannot use CDuce record expressions as they are, but instead we must adapt them to our analysis. So as anticipated, we consider records that are built starting from the empty record expression {} by adding, updating, or removing fields:
[TABLE]
in particular deletes the field from , adds the field to the record (deleting any existing field), while is field selection with the reduction: .
To define record type subtyping and record expression type inference we need three operators on record types: which returns the type of the field in the record type , which returns the record type formed by all the fields in and those in that are not in , and which returns the type in which the field is undefined. They are formally defined as follows (see alainthesis for more details):
[TABLE]
Then two record types and are in subtyping relation, , if and only if for all we have . In particular { ..} is the largest record type.
Expressions are then typed by the following rules (already in algorithmic form).
[TABLE]
To extend occurrence typing to records we add the following values to paths: , with , , and and add the following rules for the new paths:
[TABLE]
Deriving the algorithm from these rules is then straightforward:
\begin{array}[]{llll}\textsf{{Constr}}_{\Gamma,e,t}(\varpi.a_{\ell})=\boldsymbol{\texttt{\{}}\ell:\textsf{{Intertype}}_{\Gamma,e,t}(\varpi)\ {\large\textbf{..}}\boldsymbol{\texttt{\}}}&\textsf{{Constr}}_{\Gamma,e,t}(\varpi.r_{\ell})=(\textsf{{Intertype}}_{\Gamma,e,t}(\varpi))\mathtt{\setminus}\ell+\boldsymbol{\texttt{\{}}\ell\operatorname{\texttt{=?}}\MyMathBb{1}\boldsymbol{\texttt{\}}}\\ \textsf{{Constr}}_{\Gamma,e,t}(\varpi.u_{\ell}^{2})=(\textsf{{Intertype}}_{\Gamma,e,t}(\varpi)).\ell&\textsf{{Constr}}_{\Gamma,e,t}(\varpi.u_{\ell}^{1})=(\textsf{{Intertype}}_{\Gamma,e,t}(\varpi))\mathtt{\setminus}\ell+\boldsymbol{\texttt{\{}}\ell\operatorname{\texttt{=?}}\MyMathBb{1}\boldsymbol{\texttt{\}}}\\[5.12149pt] \end{array}
Notice that the effect of doing corresponds to setting the field of the (record) type to the type , that is, to the type of all undefined fields in an open record. So [PDel] and [PUpd1] mean that if we remove, add, or redefine a field in an expression then all we can deduce for is that its field is undefined: since the original field was destroyed we do not have any information on it apart from the static one. For instance, consider the test:
By —i.e., by [Ext1], [PTypeof], and [PInter]—the type for in the positive branch is . It is equivalent to the type , and thus we can deduce that has the type Bool.
3.2 Refining function types
As we explained in the introduction, both TypeScript and Flow deduce for the first definition of the function foo in (1) the type (numberstring) (numberstring), while the more precise type
[TABLE]
can be deduced by these languages only if they are instructed to do so: the programmer has to explicitly annotate foo with the type (36): we did it in (1) using Flow—the TypeScript annotation for it is much heavier. But this seems like overkill, since a simple analysis of the body of foo in (1) shows that its execution may have two possible behaviors according to whether the parameter x has type number or not (i.e., or (numberstring)number, that is string), and this is should be enough for the system to deduce the type (36) even in the absence the annotation given in (1). In this section we show how to do it by using the theory of occurrence typing we developed in the first part of the paper. In particular, we collect the different types that are assigned to the parameter of a function in its body, and use this information to partition the domain of the function and to re-type its body. Consider a more involved example in a pseudo TypeScript that uses our syntax for type-cases function (x : ) { return (x Real) ? ((x Int) ? x+1 : sqrt(x)) : !x; (37) }
where we assume that Int is a subtype of Real. When is RealBool we want to deduce for this function the type . When is , then the function must be rejected (since it tries to type !x under the assumption that x has type ). Notice that typing the function under the hypothesis that is , allows us to capture user-defined discrimination as defined by THF10 since, for instance let is_int x = (xInt)? true : false in if is_int z then z+1 else 42
is well typed since the function is_int is given type . We propose a more general approach than the one by THF10 since we allow the programmer to hint a particular type for the argument and let the system deduce, if possible, an intersection type for the function.
We start by considering the system where -abstractions are typed by a single arrow and later generalize it to the case of intersections of arrows. First, we define the auxiliary judgement where is a typing environement, an expression and a mapping from variables to sets of types. Intuitively denotes the set that contains the types of all the occurrences of in . This judgement can be deduced by the following deduction system that collects type information on the variables that are -abstracted (i.e., those in the domain of , since lambdas are our only binders):
[TABLE]
Where is the function defined as but undefined on and denotes component-wise union, that is :
[TABLE]
All that remains to do is to replace the rule [Abs+] with the following rule
[TABLE]
Note the invariant that the domain of is always conatined in the domain of restricted to variables. Simply put, this rule first collects all possible types that are deduced for a variable during the typing of the body of the and then uses them to re-type the body under this new refined hypothesis for the type of . The re-typing ensures that the type safety property carries over to this new rule.
This system is enough to type our case study (3.2) for the case defined as RealBool. Indeed, the analysis of the body yields for the branch (x Int) ? x+1 : sqrt(x) and, since , yields for the branch !x. So the function will be checked for the input types Int, , and Bool, yielding the expected result.
It is not too difficult to generalize this rule when the lambda is typed by an intersection type:
[TABLE]
For each arrow declared in the interface of the function, we first typecheck the body of the function as usual (to check that the arrow is valid) and collect the refined types for the parameter . Then we deduce all possible output types for this refined set of input types and add the resulting arrows to the type deduced for the whole function (see Section 4 for an even more precise rule).
In summary, in order to type a function we use the type-cases on its parameter to partition the domain of the function and we type-check the function on each single partition rather than on the union thereof. Of course, we could use much a finer partition: the finest (but impossible) one is to check the function against the singleton types of all its inputs. But any finer partition would return, in many cases, not a much better information, since most partitions would collapse on the same return type: type-cases on the parameter are the tipping points that are likely to make a difference, by returning different types for different partitions thus yielding more precise typing.
Even though type cases in the body of a function are tipping points that may change the type of the result of the function, they are not the only ones: applications of overloaded functions play exactly the same role. We therefore add to our deduction system a last further rule:
[OverApp] \displaystyle\displaystyle{\hbox{\hskip 63.56432pt\vbox{\hbox{\hskip-63.5643pt\hbox{\hbox{\displaystyle\displaystyle\Gamma\vdash e:\textstyle\bigvee\bigwedge_{i\in I}t_{i}\to{}s_{i}}\hskip 10.0pt\hbox{\hbox{\displaystyle\displaystyle\Gamma\vdash x:t}\hskip 10.0pt\hbox{\hbox{\displaystyle\displaystyle\Gamma\vdash e\triangleright\psi_{1}}\hskip 10.0pt\hbox{\hbox{\displaystyle\displaystyle\Gamma\vdash x\triangleright\psi_{2}}}}}}}\vbox{}}}\over\hbox{\hskip 35.919pt\vbox{\vbox{}\hbox{\hskip-35.91899pt\hbox{\hbox{\displaystyle\displaystyle\Gamma\vdash\textstyle{e}{~{}x}\triangleright\psi_{1}\cup\psi_{2}\cup\bigcup_{i\in I}{x\mapsto t\wedge t_{i}}}}}}}}
Whenever a function parameter is the argument of an overloaded function, we record as possible types for this parameter all the domains of the arrows that type the overloaded function, restricted (via intersection) by the static type of the parameter and provided that the type is not empty (). We show the remarkable power of this rule on some practical examples in Section 4.
3.3 Integrating gradual typing
Gradual typing is an approach proposed by siek2006gradual to combine the safety guarantees of static typing with the programming flexibility of dynamic typing. The idea is to introduce an unknown (or dynamic) type, denoted , used to inform the compiler that some static type-checking can be omitted, at the cost of some additional runtime checks. The use of both static typing and dynamic typing in a same program creates a boundary between the two, where the compiler automatically adds—often costly [takikawa2016sound]—dynamic type-checks to ensure that a value crossing the barrier is correctly typed.
Occurrence typing and gradual typing are two complementary disciplines which have a lot to gain to be integrated, although we are not aware of any study in this sense. We explore this integration for the formalism of Section 2 for which the integration of gradual typing was first defined by CL17 and sucessively considerably improved by castagna2019gradual (see Lanvin21phd for a comprehensive presentation).
In a sense, occurrence typing is a discipline designed to push forward the frontiers beyond which gradual typing is needed, thus reducing the amount of runtime checks needed. For instance, the JavaScript code of (1) and (1) in the introduction can also be typed by using gradual typing: function foo(x : ) { return (typeof(x) === "number")? x+1 : x.trim(); (38) }
“Standard” or “safe” gradual typing inserts two dynamic checks since it compiles the code above into: function foo(x) { return (typeof(x) === "number")? (xnumber)+1 : (xstring).trim(); }
where e$$\langle$$t$$\rangle is a type-cast that dynamically checks whether the value returned by has type .121212Intuitively, e$$\langle$$t$$\rangle is syntactic sugar for (typeof()==="") ? : (throw "Type error"). Not exactly though, since to implement compilation à la sound gradual typing it is necessary to use casts on function types that need special handling. We already saw that thanks to occurrence typing we can annotate the parameter x by number|string instead of and avoid the insertion of any cast. But occurrence typing can be used also on the gradually typed code above in order to statically detect the insertion of useless casts. Using occurrence typing to type the gradually-typed version of foo in (3.3), allows the system to avoid inserting the first cast xnumber since, thanks to occurrence typing, the occurrence of x at issue is given type number (but the second cast is still necessary though). But removing only this cast is far from being satisfactory, since when this function is applied to an integer there are some casts that still need to be inserted outside the function. The reason is that the compiled version of the function has type \mathbbm{\qm}$$\tonumber, that is, it expects an argument of type , and thus we have to apply a cast (either to the argument or to the function) whenever this is not the case. In particular, the application foo(42) will be compiled as foo(42\langle$$\mathbbm{\qm}$$\rangle). Now, the main problem with such a cast is not that it produces some unnecessary overhead by performing useless checks (a cast to can easily be detected and safely ignored at runtime). The main problem is that the combination of such a cast with type-cases will lead to unintuitive results under the standard operational semantics of type-cases and casts. Indeed, consider the standard semantics of the type-case (typeof()==="") which consists in reducing to a value and checking whether the type of the value is a subtype of . In standard gradual semantics, 42\langle$$\mathbbm{\qm}$$\rangle is a value. And this value is of type , which is not a subtype of number. Therefore the check in foo would fail for 42\langle$$\mathbbm{\qm}$$\rangle, and so would the whole function call. Although this behavior is type safe, this violates the gradual guarantee [siek2015refined] since giving a more precise type to the parameter x (such as number) would make the function succeed, as the cast to would not be inserted. A solution is to modify the semantics of type-cases, and in particular of typeof, to strip off all the casts in values, even nested ones. While this adds a new overhead at runtime, this is preferable to losing the gradual guarantee, and the overhead can be mitigated by having a proper representation of cast values that allows to strip all casts at once.
However, this problem gets much more complex when considering functional values. In fact, as we hinted in Section 2.6, there is no way to modify the semantics of type cases to preserve both the gradual guarantee and the soundness of the system in the presence of arbitrary type cases. For example, consider the function f=\lambda^{(\text{{Int}}\to\text{{Int}})\to\text{{Int}}}g.(g{\in}(\text{{Int}}\to\text{{Int}}))\,\texttt{{?}}\,g\ 1\,\texttt{{:}}\,\texttt{\color[rgb]{0,0.2,0.4}true}. This function is well-typed since the type of the parameter guarantees that only the first branch can be taken, and thus that only an integer can be returned. However, if we apply this function to , the type case strips off the cast around (to preserve the gradual guarantee), then checks if has type . Since is not a subtype of , the check fails and the application returns true, which is unsound. Therefore, to preserve soundness in the presence of gradual types, type cases should not test functional types other than , which is the same restriction as the one presented by siek2016recursive.
While this solves the problem of the gradual guarantee, it is clear that it would be much better if the application foo(42) were compiled as is, without introducing the cast 42\langle$$\mathbbm{\qm}$$\rangle, thus getting rid of the overhead associated with removing this cast in the type case. This is where the previous section about refining function types comes in handy. To get rid of all superfluous casts, we have to fully exploit the information provided to us by occurrence typing and deduce for the function in (3.3) the type (numbernumber)((\number)string), so that no cast is inserted when the function is applied to a number. To achieve this, we simply modify the typing rule for functions that we defined in the previous section to accommodate for gradual typing. Let and range over gradual types, that is the types produced by the grammar in Definition 2.1 to which we add as basic type (see castagna2019gradual for the definition of the subtyping relation on these types). For every gradual type , define as the (non gradual) type obtained from by replacing all covariant occurrences of by and all contravariant ones by . The type can be seen as the maximal interpretation of , that is, every expression that can safely be cast to is of type . In other words, if a function expects an argument of type but can be typed under the hypothesis that the argument has type , then no casts are needed, since every cast that succeeds will be a subtype of . Taking advantage of this property, we modify the rule for functions as:
[TABLE]
The main idea behind this rule is the same as before: we first collect all the information we can into by analyzing the body of the function. We then retype the function using the new hypothesis for every . Furthermore, we also retype the function using the hypothesis : as explained before the rule, whenever this typing suceeds it eliminates unnecessary gradual types and, thus, unecessary casts. Let us see how this works on the function foo in (3.3). First, we deduce the refined hypothesis \psi(\texttt{\color[rgb]{0,0.2,0.4}x})=\{\,\texttt{\color[rgb]{0,0.2,0.4}number}{\land}\mathbbm{\qm}\;,\;\mathbbm{\qm}\textbackslash\texttt{\color[rgb]{0,0.2,0.4}number}\,\}. Typing the function using this new hypothesis but without considering the maximal interpretation would yield (\mathbbm{\qm}\to\texttt{\color[rgb]{0,0.2,0.4}number}\vee\texttt{\color[rgb]{0,0.2,0.4}string})\land((\texttt{\color[rgb]{0,0.2,0.4}number}\land\mathbbm{\qm})\to\texttt{\color[rgb]{0,0.2,0.4}number})\land((\mathbbm{\qm}\textbackslash\texttt{\color[rgb]{0,0.2,0.4}number})\to\texttt{\color[rgb]{0,0.2,0.4}string}). However, as we stated before, this would introduce an unnecessary cast if the function were to be applied to an integer.131313Notice that considering \texttt{\color[rgb]{0,0.2,0.4}number}\land\mathbbm{\qm}\simeq\texttt{\color[rgb]{0,0.2,0.4}number} is not an option, since it would force us to choose between having the gradual guarantee or having, say, \texttt{\color[rgb]{0,0.2,0.4}number}\land\texttt{\color[rgb]{0,0.2,0.4}string} be more precise than \texttt{\color[rgb]{0,0.2,0.4}number}\land\mathbbm{\qm}.
Hence the need for the second part of Rule [AbsInf+]: the maximal interpretation of \texttt{\color[rgb]{0,0.2,0.4}number}\land\mathbbm{\qm} is number, and it is clear that, if x is given type number, the function type-checks, thanks to occurrence typing. Thus, after some routine simplifications, we can actually deduce the desired type (\texttt{\color[rgb]{0,0.2,0.4}number}\to\texttt{\color[rgb]{0,0.2,0.4}number})\land((\mathbbm{\qm}\textbackslash\texttt{\color[rgb]{0,0.2,0.4}number})\to\texttt{\color[rgb]{0,0.2,0.4}string}).
4 Implementation
We present in this section preliminary results obtained by our implementation. After giving some technical highlights, we focus on demonstrating the behavior of our typing algorithm on meaningful examples. We also provide an in-depth comparison with the fourteen examples of [THF10].
4.1 Implementation details
We have implemented the algorithmic system we presented in Section 2.6.3. Besides the type-checking algorithm defined on the base language, our implementation supports the record types and expressions of Section 3.1 and the refinement of function types described in Section 3.2. Furthermore, our implementation uses for the inference of arrow types the following improved rule:
[TABLE]
instead of the simpler [AbsInf+] given in Section 3.2. The difference of this new rule with respect to [AbsInf+] is that the typing of the body is made under the hypothesis , that is, the domain of the function minus all the input types determined by the -analysis. This yields an even better refinement of the function type that makes a difference for instance with the inference for the function xor_ (see Code 3 in Table 4.2): the old rule would have returned a less precise type. The rule above is defined for functions annotated by a single arrow type: the extension to annotations with intersections of multiple arrows is similar to the one we did in the simpler setting of Section 3.2.
The implementation is rather crude and consists of 2000 lines of OCaml code, including parsing, type-checking of programs, and pretty printing of types. CDuce is used as a library to provide set-theoretic types and semantic subtyping. The implementation faithfully transcribes in OCaml the algorithmic system as well as all the type operations defined in this work. One optimization that our implementation features (with respect to the formal presentation) is the use of a memoization environment in the code of the function, which allows the inference to avoid unnecessary traversals of . Lastly, while our prototype allows the user to specify a particular value for the parameter we introduced in Section 2.6.2, a value of for is sufficient to check all examples we present in the rest of the section.
4.2 Experiments
We demonstrate the output of our type-checking implementation in Table 4.2 and Table LABEL:tab:implem2. Table 4.2 lists some examples, none of which can be typed by current systems. Even though some systems such as Flow and TypeScript can type some of these examples by adding explicit type annotations, the code 6, 7, 9, and 10 in Table 4.2 and, even more, the and_ and xor_ functions given in (4.2) and (4.2) later in this section are out of reach of current systems, even when using the right explicit annotations.
It should be noted that for all the examples we present, the time for the type inference process is less than 5ms, hence we do not report precise timings in the table. These and other examples can be tested in the online toplevel available at https://occtyping.github.io/
