
TL;DR
This paper reviews Corrado B"ohm's key contributions to computing, including self-compiling compilers, structured and functional programming, and foundational models of computability, highlighting their historical and theoretical significance.
Contribution
It presents a comprehensive overview of B"ohm's pioneering ideas and algorithms that have shaped modern programming and computability theory.
Findings
Development of a self-compiling compiler
Introduction of structured programming eliminating 'goto'
Early implementation of functional programming
Abstract
The main scientific heritage of Corrado B\"ohm consists of ideas about computing, concerning concrete algorithms, as well as models of computability. The following will be presented. 1. A compiler that can compile itself. 2. Structured programming, eliminating the 'goto' statement. 3. Functional programming and an early implementation. 4. Separability in {\lambda}-calculus. 5. Compiling combinators without parsing. 6. Self-evaluation in {\lambda}-calculus.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\lmcsdoi
16315 \lmcsheadingLABEL:LastPageDec. 07, 2018Sep. 02, 2020
Gems of Corrado Böhm
Henk Barendregt
Faculty of Science, Radboud University Nijmegen
Box 9010
6500GL Nijmegen, The Netherlands
Abstract.
The main scientific heritage of Corrado Böhm consists of ideas about computing, concerning concrete algorithms, as well as models of computability. The following will be presented. 1. A compiler that can compile itself. 2. Structured programming, eliminating the ‘goto’ statement. 3. Functional programming and an early implementation. 4. Separability in -calculus. 5. Compiling combinators without parsing. 6. Self-evaluation in -calculus.
Key words and phrases:
self-compiling, structured programming, functional programming, lambda calculus, combinators, self-evaluation
To the memory of Corrado Böhm (1923-2017)
Introduction
As a tribute to Corrado Böhm this paper presents six brilliant results of his and also discusses some of their later developments. Most of the papers are written by Böhm with co-authors. The result on elimination of the goto, Section 2, is written by Giuseppe Jacopini alone in the joint paper [BJ66], but one may assume that Böhm as supervisor had influenced the research involved, and therefore this result is included here. This paper is written such that computer science freshmen can read and understand it.
1. Self-compilation
In his PhD thesis [Böh54] at the ETH Zürich, Corrado Böhm constructed one of the first higher programming languages together with a compiler for it. The compiler has the particular feature that it is written in the language itself. This sounds like magic, but it is not: if a programming language is capable of expressing any computational process, then it should also be able to ‘understand itself’ (i.e. perform the computational task to translate it into machine language). Later this property gave rise to ‘bootstrapping’: dramatically increasing efficiency and reliability of computer programs, that seems as impossible as to pull oneself over a fence by pulling one’s bootstraps111In Europe the hyperbole for impossibility is the story of Baron (von) Münchhausen, who could get himself (and the horse on which he was seated) out of a swamp by pulling up his own hair.. This gave rise to the term ‘booting a computer’. The mechanism will be explained in this section.
1.1. Algorithms, computers, and imperative programming
An algorithm is a recipe to compute an output from a given input. Executing such a recipe basically consists in putting down pebbles222The word ‘pebble’ in Latin is ‘calculus’. in a fixed array of boxes and ‘replacing’ these pebbles step by step. That is, a pebble may be moved from one box to another one, be taken away, or new ones may be added. Such a process is called a calculation or computation. As shown in [Tur37b], all computational tasks, like “What is the square of 29?”, “Put the following list of words in alphabetical order”, or “What does Wikipedia say about the concept ‘bootstrap’?”, can be put in the format of shuffling pebbles in boxes.
This view on computing holds for computations on an abacus, but also for programmed computers. A computer is a, usually electronic, device with memory, that performs computations. The pebbles are represented in this memory and the shuffling is done by making stepwise changes. A simple conceptual computer is the Turing Machine (TM). It consists of an infinite333Actual computers only have a finite amount of memory. Turing apparently didn’t want to be technology dependent and conceived the Turing Machine with an idealized memory of infinitely many cells. But at any given moment in a computation only finitely many cells contain a 1. tape of discrete cells that can be numbered by the integers . At every moment in the computation only a finite number of these cells contain information, either a 1 or nothing: the original TM was a 0-bit444In 0-bit machines counting happens in the -ary, i.e. unary, system. In modern computers the cells are replaced by registers that contain a sequence of 64 or more bits that can be read or overwritten in parallel; moreover, the registers do not need to be looked up linearly, like on the tape of the TM, but there is fast access to each of them; one speaks of ‘random access memory’ (RAM). machine. The machine can be in one of a finite number of states. For a computational problem the input, coded as a list of the symbols, is written on the tape. There is a read/write (R/W) head positioned on one of the cells of the tape. Depending on the symbol that is read, and the present state , one of the following three actions is performed: a (possibly different) symbol is written on the cell under the R/W-head, a (possibly different) state is assumed, and finally the head moves (: one position to the right, : one position to the left, : no moving). When finally no action can be performed any longer, the resulting information on the tape represents the output of the computation. Each Turing Machine is determined by a finite table consisting of 5-tuples like that determine the changes.
Turing showed that there exists a particular kind of machine, called a universal machine , that suffices to make arbitrary computations. Such a is conceptually easy. The set of 5-tuples of a particular machine is presented as a table ‘in its silicon’. A universal machine that imitates , needs this table as extra input in coded form, including the collection of all states of (that may be more extensive than that of ) and the present state of , stored in a dedicated part of the memory as the program (nowadays known as the ‘app’) for . The instruction table of stipulates that 1. it has to look in in order to see what is the present state of , and to know what to do next; and 2. to do this. The possibility of a universal machine provides a model of computation in which a single machine , using programming language , can perform any computational job. The nature of the actions of Turing machines, described in their action tables, is rather imperative: overwrite information, change state, move. For this reason the resulting computational model is called imperative programming.
In this paper we will consider a fixed universal machine . Around 1950, when Corrado Böhm worked on his PhD, computers were rare. Indeed, in 1954, in a country like the Netherlands there were only three computers (at the Mathematical Center, the Royal Meteorological Institute, and the National Phone Company) and no more were deemed to be necessary! Nowadays a standard car often has on board in the order of (universal) computers in the form of microprocessor chips.
A program in a given language for consists of a sequence of statements in that the machine ‘understands’: it performs intended changes on data represented in the memory of . Such programs are denoted by , the optional superscript indicating that the program is written in the language .
{defi}
- (1)
There is a non-specified set (for data) consisting of the intended objects on which computations take place. 2. (2)
The process of running program on input in is denoted by 555Compound expressions like make sense and will be used. But an expressions like we will avoid, as one is forced to evaluate first the , which may be undefined; therefore even if , one doesn’t always have . See [Bar84, Exercise 9.5.13] and [Bar75, Bar96].. If this process terminates with end result (the output, again in ), then we write
[TABLE] 3. (3)
It may be the case that doesn’t terminate. Then there is no output, and we write . 4. (4)
The (operational) semantics of is the partial map defined as follows.
[TABLE]
For and , that depend on , we sometimes write , , respectively.
The difference between and is that the former is an identity, like that holds by definition, whereas the latter requires a computation, like . The sign ‘’ indicates that a computation has to be performed that takes time, consisting of a sequence of a few or more steps that transform information.
Proposition 1**.**
If terminates, then
[TABLE]
Proof 1.1**.**
By definition.
1.2. Programming languages and compilers
A human, having to write a correct and efficient program, better does this in an understandable way, rather than in the form of recipes for shuffling pebbles. One can use a programming language for this, in which computational tasks can be described more intuitively. In [Böh54] an early example of such a language is constructed.
{defi}
- (1)
A programming language consists of programs that describe computations according to (2). 2. (2)
comes with a (denotational) semantic function . That is, to each it assigns a (possibly partial) function .
Technically speaking is also a programming language, the machine language, with its denotational semantics , by definition equal to the operational one . By contrast other programming languages are called higher programming languages, that are intended to make the construction of programs more easy. When one has a program described in a higher programming language we want to have machine help from a universal machine to obtain from input the output . We succeed if one can translate in the ‘right way’ into the machine language . This translating is called compiling.
{defi}
A function , is called a compiling function if
[TABLE]
In this paper, we will usually consider only compilers into .
Proposition 2**.**
If is a compiling function, then
[TABLE]
Proof 1.2**.**
One has by Proposition 1 and Definition 1.2
[TABLE]
This shows that an intended computation using a , intended to compute , can in principle be replaced by a computation using a , for which there is the support from the machine . We say: the computational task becomes executable (by ). In modern compilers the translation , is often divided in literally hundreds of steps, using many intermediate languages666For example one may have a long series of translations: . For example, the first step is the so called lexing that examines where every meaningful unit starts and ends777Every student of a foreign language has to master this also: a stream of sounds ‘papafumeunepipe’ has to be separated into words as follows ‘papa fume une pipe’; only then one can translate further, into ‘father smokes a pipe’.. At the end of the long translation process one arrives at the language . No need for further translation occurs: in the programs in machine language are run by the laws of physics (electrical engineering).
Compiling functions are notably useful if the translated program in in turn is executable. Translating is a computational task and in principle determining can be done by hand. But since many programs, also in a higher order programming language, may consist of several million instructions, the computational task of compiling much better be performed by a machine as well. A program that performs this translation is called a compiler. If such an automated translation process is of any use, the compiler needs to be written either in machine language , or in another language for which there is already another compiler from to .
{defi}
Let be a compiling function. A compiler for written in language is a program such that
[TABLE]
This is useful only if programs in are also executable. This is the case if or if there is already a compiler from into . Two cases will be important in this paper. (1.) and (2.) .
1.3. Compilers written in machine language
First consider a compiler written in machine language .
Proposition 3**.**
Let be a compiler for a compiling function .
- (1)
For all programs written in one has 2. (2)
A computational job can be fully automated as follows.
[TABLE]
Proof 1.3**.**
- (1)
By Definition 1.2 we have . Hence by Proposition 1
[TABLE] 2. (2)
It follows that
[TABLE]
{defi}
Let be a compiler written in .
- (1)
By Proposition 3(2) there are two computation phases towards :
[TABLE]
The first computation 1, that is , takes place in a time interval that is called compile-time; the second computation 2, that is , takes place in a time-interval that is called run-time. 2. (2)
If for programs and inputs (that interest us) the run-time is short (for our purposes), then the compiler is said to produce efficient code. Note that this pragmatic definition depends only on the compiling function , and not on its program, the compiler itself. 3. (3)
If for programs (that interest us) the compile-time is short (for our purposes), then the compiler is said to be fast. Note that this notion does depend on the compiler , and not on the compiling function .
Proposition 4**.**
For a programming language , in which every program is a sequence of statements consisting of a computable step, there exists a simple compiler written in for a compiling function , mimicking the steps in as steps in . Such a compiler is called a (simple) interpreter.
Proof 1.4** (Sketch).**
Let . Define where mimics the statement by a (small) program in .
For complex computational problems using a large program both the compile-time and run-time consume considerable amounts of time. Often these are bottlenecks for the feasibility of executing a program. Moreover, interpreters usually produce less efficient code than compilers, for reasons to be discussed next.
1.4. Compilers written in higher programming languages
Now we consider the task of writing a compiler . A compiler more complex than a simple interpreter is able to look at the input program in its totality and can ‘reflect’ (act) on it, enabling optimizations for the run-time of the resulting code . Such a compiler improves efficiency888Software engineering studies ways to develop new versions of programs and compilers, in order to improve time performance and also to correct bugs (errors)., using the power and flexibility of . With the right effort a compiler can be developed that produces efficient code, so that to use such a compiler the run-time performance of the translated programs are optimized. This doesn’t apply to the compile-time if compiler is written in , for which it is hard to achieve optimizations.
In his PhD thesis (1951) of just 50 pages Corrado Böhm designed a programming language and constructed a compiler , in itself. This later made bootstrapping possible: producing not only efficient programs, but also making the compilation process itself efficient. We will explain how this is achieved. Suppose one has a compiler that produces efficient code (efficiently running programs). Here ‘efficient’ is used in a non-technical intuitive sense. In order to run one needs a simple interpreter , written in . Now we will describe three ways of computing , that is, finding the intended result that program has acting on input .
1. Computing using the simple interpreter :
[TABLE]
This has both inefficient compile-time and run-time.
2. Better efficiency using , run by the interpreter. Define , the interpreter applied to the compiler written in . This can be precompiled
[TABLE]
as the code of in the sense that . One now has
[TABLE]
Computing is a one time job and, as the result can be stored, it doesn’t count in measuring efficiency. The first computation counts as the compile time of . But it is also the run-time of (with compiling function ) and doesn’t need to be efficient. The second computation is the run time of (with compiling function ) and was assumed to be efficient. Therefore this computation has an efficient run-time, but not necessarily an efficient compile-time.
- Best efficiency using : define , the compiler applied to itself. This can be precompiled as follows.
[TABLE]
just requiring a one time computation. Then again , but now
[TABLE]
with both efficient compile and run-time, as both codes have been generated by .
[TABLE] \textstyle{c_{I}^{L,M}c_{B}^{L,L}c_{B}^{L,L}p^{L}x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\rule{0.0pt}{18.00005pt}}$$\scriptstyle{1.2}$$\textstyle{{[\![{c_{I}^{L,M}}]\!]}_{M}(c_{B}^{L,L})p^{L}x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{1.6}$$\textstyle{c_{I}^{L,M}c_{B}^{L,L}p^{L}x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{1.2}$$\textstyle{C_{B}^{L}(c_{B}^{L,L})p^{L}x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{1.5}$$\textstyle{{[\![{c_{I}^{L,M}}]\!]}_{M}(c_{B}^{L,L})p^{L}x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{1.6}$$\textstyle{{[\![{c_{B}^{L,L}}]\!]}_{L}(c_{B}^{L,L})p^{L}x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{1.6}$$\textstyle{{c_{I}^{L,M}p^{L}x}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{\rm slow\ compiling\ 1.2\hskip 35.70012pt}$$\textstyle{\underline{C_{I}^{L}(c_{B}^{L,L})}p^{L}x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{\rm slow\ compiling\ 1.5\hskip 35.70012pt}$$\textstyle{\underline{C_{B}^{L}(c_{B}^{L,L})}p^{L}x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{\rm\hskip 35.00008pt1.5\ efficient\ compiling}$$\textstyle{{[\![{c_{I}^{L,M}}]\!]}_{M}(p^{L})x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{1.6}$$\textstyle{{[\![{c_{B}^{L,L}}]\!]}_{L}(p^{L})x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{1.6}$$\textstyle{C_{I}^{L}(p^{L})x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{\rm slow\ running\ 1.5\hskip 32.2001pt}$$\textstyle{C_{B}^{L}(p^{L})x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{\rm\hskip 39.20012pt1.5\ efficient\ running}$$\textstyle{{[\![{p^{L}}]\!]}_{L}(x)}
Figure 1. Bootstrapping: precompiled , provide efficient run time alone, or both run time and compile time, respectively.
In the language of combinatory logic, so much admired by Corrado Böhm, one writes , or simply , for , and for , etcetera (association to the left). Then the three ways of compiling and computing a job can be rendered as in Figure 1. The underlined expressions denote the codes of the Böhm compiler that are obtained by precompilation, respectively using the interpreter and using itself. So in the steps above these do not require time. This bootstrapping process wasn’t discussed in Böhm’s PhD thesis, but it was made possible by his invention and implementation of self-compilation. In Figure 1.4 the bootstrap process is presented in a slightly different way.
After having obtained his PhD in Zürich, Böhm did succeed registering a patent on compilers. But, unexpectedly, a few years later (1955) IBM came with its FORTRAN compiler. It turned out that Böhm’s patent was valid only in Switzerland!
1.5. Compiler configurations
In this section we treat compilers in greater generality, translating a language into . Only one machine is used for the translation, but this easily can be generalized. We settle the question whether it is necessary to have self-compiling, in order to make compile-time and run-time both efficient.
{defi}
- (1)
We define the language of compiler configurations by the following context free grammar.
[TABLE]
Actually is a symbol for a language , but we identify the two. 2. (2)
Let . The language of , in notation , is defined as follows.
[TABLE] 3. (3)
Correctness of is defined as follows.
is correct;
is correct if is a program in programming language ,
are correct and
is a compiling function.
{exa}
The three situations in Subsection 1.4 can be described as compiler configurations. We use and instead of and , respectively.
[TABLE]
{defi}
A compiler configuration can be drawn as a labeled tree .
[TABLE]
Compiler configurations and their trees are more convenient to use than the more rigid T-diagrams introduced in [MHW70], since there is more flexibility to draw languages that still need to be translated. For example, is the compiler configuration employed by Böhm and its tree explains well the magic trick.
{defi}
A compiler configuration is inductively defined to be executable as follows.
[TABLE]
{exa}
- (1)
The three compiler configurations considered before are executable.
[TABLE] 2. (2)
The following compiler configurations, drawn as trees, are not executable:
[TABLE]
because an evaluation function for neither nor is given.
{defi}
To each we assign a function that maps a program and value to a value , also written .
[TABLE]
Exercise 1.5**.**
For all correct and executable , , one has
[TABLE]
{exa}
In the following evaluations we leave out parenthesis, like in lambda calculus and combinatory logic.
[TABLE]
Do we absolutely need self-compilation in order to obtain efficient compilation? The answer is negative. Suppose one has the following:
- (1)
a compiler , producing fast code, written in ; 2. (2)
a compiler , producing fast code written in ; 3. (3)
a simple interpreter , written in .
Then one can form the following correct and executable compiler configuration:
[TABLE]
with tree
[TABLE]
Again one obtains a compiler with fast compile-time that produces efficient code
[TABLE]
In the magic trick of Böhm, compiler (3) in Subsection 1.4 above, he took and . This saves work: only one language and one compiler need to be developed.
2. Structured programming
In a Turing machine transition a state can be followed by any other state. Therefore many programming languages naturally contain the ‘goto’ statement. When these are used in a mindless way, the meaning of a program is not obvious, hence its correctness is much more difficult to warrant. The first half of [BJ66] is dedicated to eliminate goto statements, as a first step towards structured programs. That part of the paper is stated to be written by Jacopini, but I think we may suppose that Böhm, the supervisor of Jacopini, has contributed to it.
2.1. Imperative programming
The Universal Turing Machine, or an improved version, immediately gives rise to a language with goto statements: the machine, being in state changes (under the right conditions) into state . This is expressed by a statement very much like a 5-tuple of a Turing Machine , that in the presence of named registers looks like
[TABLE]
Here the meaning is as follows: the machine checks whether the content of register equals 1 and then it overwrites the 1 by a 0 as the content of register x, after which it jumps to state . In the presence of addressable registers like , there is no longer a need to use the small step local movements indicated by . A more extended example is the following.
[TABLE]
Apart from branching, leading naturally to a flow-chart as a representation of such a program, we also see the for imperative programming typical statement , meaning that the content of register is overwritten by the old content augmented by one. Many such components can form nice-looking but hard to understand diagrams. One can imagine that the idea arose to create more understandable diagrams and as a first step to eliminate the goto statements.
2.2. Eliminating the ‘goto’
In this subsection it is shown that the result of eliminating the go to statement can be seen in the light of Kleene’s analysis of computability, as was pointed out in [Coo67] and also in [Har80].
Theorem 5** (Kleene Normal Form Theorem).**
There are functions that are primitive computable such that every computable function has a code number such that for all one has
[TABLE]
If is a predicate on , then denotes the least number such that , if this exists, otherwise the expression is undefined. In (NFT) it is assumed that for all there exists a such that holds999The formula (NFT) also holds for partial functions , in which case iff ..
Proof 2.1** (Sketch).**
The value of the function can be computed by the Universal Turing Machine using, say, as program. Then there is a computation
[TABLE]
where , ‘’ is the first configuration, ‘’ is the last one that is terminating, and . Furthermore, is the characteristic function ( when true, when false) of the primitive computable predicate , that holds if is (the code of) the computation (comp). After a search (by ) for this (coded sequence) , the is easily obtainable from it, which is done by the primitive computable function .
{thmC}
[[BJ66]] A program built up from statements of the form
\left.\begin{tabular}[]{l}x:=x+1\\ x:=x-1\\ if B, then S_{1}S_{2}\\ goto q\end{tabular}\right\}L_{1}
can be replaced by an equivalent one built up from statements of the form
\rule{0.0pt}{18.99995pt}\left.\begin{tabular}[]{l}x:=x+1\\ x:=x-1\\ if B, then S_{1}S_{2}\\ for k:=0 to n do A(k)\\ while x>0 do A(x)\end{tabular}\right\}L_{2}
Proof 2.2** (Sketch).**
A function with program from will be computable by the universal Turing Machine by program, say, . Therefore by Theorem 5 one has . The functions are primitive computable, hence expressible by the ‘’ statements. Only for the a statement is needed. (Actually this happens only a single time.)
Corollary 6** (Folk Theorem).**
Programs in can be replaced by an equivalent one in using the while construct only a single time.
Proof 2.3**.**
By the parenthetical remark in the proof of 2.1.
2.3. Evaluation
After the goto was shown to be eliminable, in Dijkstra’s note [Dij68] a polemics was started ‘goto statement considered harmful’. In the book [DDH72] structured programming was turned into an art. In [Knu74] it is argued that eliminating the goto as in the above proof of Theorem 2.1 may produce unstructured programs, unrelated to the original program. The original proof in [BJ66] does preserve the structure of the program in a better way. See [Mil72] for a discussion. An even better way to eliminate the goto statements, while preserving the structure of a program, is described in [AM72]. An example of a program in which a goto statement does improve its structure is also given in [Knu74].
In [Har80] the paper [BJ66] was taken as an example of how a ‘Folk Theorem’ appears. The result attributed to these authors often is Corollary 6, rather than Theorem 2.1 itself.
As remarked in [BJ66] it seems necessary to use an extra variable to obtain a program without a goto, but the authors couldn’t find a proof of this conjecture. A proof was given in [KF71], also in [AM72] and in [KT08].
Although the Böhm-Jacopini result started a discussion towards structured programming, a new idea was needed to obtain even better structured programs. As we will see in the next section, actually it was an old idea: functional programming based on lambda calculus.
3. Functional programming and the CUCH machine
It was Wolf Gross, colleague of Corrado Böhm, who introduced the latter to functional programming based on type-free lambda calculus, in which unlimited self-application is possible. As can be imagined, knowing the construction of a self-applicative compiler, it had a deep impact on the sequel of Böhm’s professional life. We restrict ourselves and give some historical and conceptual background.
3.1. Functional programming
Alonzo Church introduced lambda calculus as a way to mathematically characterize the intuitive notion of computability. I seem to remember that he told me the following story. Church’s thesis supervisor, Oswald Veblen, gave him the problem to compute the Betti numbers of an algebraic surface given by a polynomial equation. Church did not succeed in this task and was stuck developing his PhD thesis. He then did what other mathematicians do in similar circumstances: solve a different but related problem. Church wondered what the notion ‘computable’ actually means. Perhaps determining the Betti number of a surface from its description is not a computable task.
Church then introduced a formal system for mathematical deduction and computation [Chu32, Chu33]. In [KR35] his students Kleene and Rosser found an inconsistency101010The proof of a contradiction in Church’s system was beautifully simplified in [Cur42]. in Church’s original system. In [Chu36] the system was stripped from the deductive part obtaining the (pure) lambda calculus, which turned out to be provably consistent [CR36]. See [Bar84] for an extensive exposition of the lambda calculus.
To formally define the notion of computability, Church introduced numerals representing natural numbers as lambda terms. Rosser found ways to add, multiply and exponentiate: that is, he found terms such that , and similarly for multiplication and exponentiation. This way these three functions were seen to be lambda definable. Here ‘’ denotes many-step rewriting, the transitive reflexive closure of one-step rewriting ‘’ introduced below. At first neither Church nor his students could find a way to lambda define the predecessor function. At the dentist’s office Kleene did see how to simulate recursion by iteration and could in that way construct a term lambda defining the predecessor function, [Cro75]. (I believe Kleene told me it was under the influence of laughing gas, , used as anesthetic.) When Church saw that result he stated “Then all intuitively computable functions must be lambda definable.” That was the first formulation of Church’s thesis and the functional model of computation was born. At the same time Church gave an example of a function that was non-computable in this model.
In [Tur37a] it was proved that the imperative and functional models of computation have the same power: they can compute exactly the same partial functions, on say the natural numbers. The way these computations are performed, however, differs considerably. In both cases computations traverse a sequence of configurations, starting essentially from the input leading to the output. But here the common ground ends.
3.2. Comparing imperative and functional programming
In functional programming the argument(s) (or ) for a computation in the form of a function that has to be applied to them form one single expression (respectively ). Such expressions are subject to rewriting. If the expression cannot be rewritten any further, then the so called normal form has been reached and this is the intended output. The intermediate results all have the same meaning as the original expression and as the output. A basic example of this is
[TABLE]
where is the function that assigns to the value . In more complex expressions there is a choice of how to rewrite, that is, which subexpression to choose as focus of attention for elementary steps as above. For example not all choices will lead to a normal form. There are reduction strategies that always will find a normal form if it exists. Normal forms, if they are reached, are unique, the result is independent of choices how to rewrite. However performance, both time and space, is sensitive to the steps employed.
In the imperative model a computation the configurations at each moment of a computation sequence of a Turing Machine consist of the momentaneous memory content on the tape, the state of , and position of its head: . Each terminating computation runs as follows:
[TABLE]
where is a halting state (and is irrelevant). The transitions depend on the set of instructions of the Turing Machine . In the case of non-termination the configurations never reach one with a terminal state. This description already shows that, wanting to combine Turing Machines to form one that is performing a more complex task, requires some choices of e.g. making the final state of the first machine fit with the initial one of the second machine.
In the functional model of computation the sequence of configurations is as follows:
[TABLE]
All of these configurations are -terms and the transitions are according to the single -rule of reduction, which is quite different. In order to make a more fair comparison between the imperative and functional computation, one could change (IP) and denote it as
[TABLE]
where is the code (program) that makes the universal machine imitate the machine . This makes (IP*′*) superficially similar to (FP), that nevertheless is superior.
Advantages of functional programming
In the sequence (FP) the expressions are words in a language more complex than the simple strings in (IP) or (IP*′*).
- (i)
The -terms expressing functional programs have the possibility of making abstraction upon abstraction, arbitrarily often. This means that ‘components’ of functions can be also functions (of functions), enabling flexible procedures. 2. (ii)
In FP there is no mention of state and position, hence there is no need to deal with the bureaucracy of these when combining programs. Hence FP has easy compositionality. 3. (iii)
In the sequence (FP) the meaning of each configuration remains the same, from the first to the last expression. This can be seen clearly in the sequence (1) above.
Features (i) and (ii) of functional programs makes them transparent and compact. Feature (iii) makes it easier to prove them correct: reasoning with mathematical induction, substitution and abstraction often suffice; no need to learn new logical formalisms that are used to analyze imperative programs. It can be expected that FP will become more and more important. The lack of side-effects makes it more easy to make parallel versions of programs.
Implementations of functional programming
Functional Programming has been developed much more slowly than Imperative Programming. The reason is that imperative programs can be implemented rather directly on a Turing Machine or modern computer. This is not the case for functional programs. Attempts to construct specialized hardware for Functional Programming have not been successful. But compilers from functional languages into ordinary CPU’s using imperative programs have been successfully developed.
One of the early examples is the SECD machine in [Lan64], soon followed by work on the CUCH machine, [BG66], [Böh66]. After fifty years of research on the use and implementation of functional programming the field has come of age. There exist fast compilers producing efficient code. One can focus on the mathematical definition of the functions involved and the correctness of these can be proved with relatively simple tools, like substitution, abstraction and induction. A functional program is automatically structured. There are for example no ‘goto’ statements. See [BMP13] for a short description, [Hug89] for an extensive motivation, and [PJ87] for implementing functional programming languages.
Challenges for functional programming
There are two main challenges for FP. 1. The lack of state makes writing code for input/output more complex. 2. The evaluation result, the output, doesn’t depend on the way reduction takes place, but it is not always easy to reason about space and time efficiency. These issues are beyond the scope of this paper111111Well known functional languages are LISP (later called Lisp), [MAE*+*62] (with many modern versions starting with Scheme [Sch]), and ML [MTHM90] (with modern version OCaml [OCa]). ML is loosely characterized as ‘Lisp with types’ coming from the simply types lambda calculus, see [Chu40, Cur34], with a rich mathematical structure [BDS13], Part I. However, Lisp and ML are not pure functional programming languages, in that they have assignment statements that can be used for input and output, making it also possible to write unstructured programs. In the pure functional languages, Haskell [Has] and Clean [Cle], at present the most developed ones, the I/O problem is solved by respectively monads and uniqueness typing. But using these features, in both cases it is still possible to write incomprehensible code when dealing with I/O..
4. Separability in -calculus
A mathematician is interested in numbers, not because these may represent the amount of money in one’s bank account (almost offensive to mention), but for their properties definable from the basic arithmetical operations and , such as primality. Such a love for numbers is not shared by most people. In the same way Corrado Böhm became interested in -terms, not because they represent programs that one can sell, but for their properties definable from the basic lambda calculus operations: application and abstraction. This is somewhat different from another form of fascination, that of Donald Knuth for imperative programs that is obvious from his volumes [Knu18], driven by the challenge to write clear, elegant, and efficient algorithms that perform relevant computational tasks. We assume elementary knowledge of lambda calculus and recall the following notations.
{nota}
- (1)
The set of all lambda terms is denoted by . The set of free variables of is denoted by . The set consists of the closed lambda terms without free variables, like , but not . 2. (2)
‘’ denotes equality up to renaming bound variables, e.g. . 3. (3)
‘=’ denotes -convertibility on -terms, often denoted by ‘’ to be explicit. 4. (4)
is generated by -reduction , as in . 5. (5)
is generated by -reduction , as in . 6. (6)
is in normal form (-nf) if no (nor ) step is possible. 7. (7)
For write , with a fresh variable, i.e. . 8. (8)
Write . Note that , for . 9. (9)
Write
[TABLE]
Separability of two normal forms
{defi}
Terms are called separable if for all there exists an such that
[TABLE]
This is equivalent to requiring that there is a lambda definable bijection
[TABLE]
with lambda definable inverse, in which case we write .
In result 7 the principal step was proved in [Böh68] with the following result.
{thmC}
[[Böh68]] Let be two different -terms in -nf. Then for all there exist such that
[TABLE]
Proof 4.1** (Sketch).**
A full proof (in English) is in [Bar84, Theorem 10.4.2] and an intuitive proof with applications in [GPDC09]. Idea: give the arguments separating the two. As we do not know in advance which arguments will work, we use variables as unknowns and substitute for them later. It suffices to reach two distinct variables, as they can be replaced by . We present some examples.
*Example 1. . *
{\scriptsize\begin{array}[]{l}\rule{8.53581pt}{0.0pt}xy\rule{39.83385pt}{0.0pt}{x{:=}{\sf K}\!{\sf K}}\rule{28.45274pt}{0.0pt}{zvw}\rule{85.35826pt}{0.0pt}{z{:=}{\sf I}}\end{array}}
\begin{array}[c]{l|l|l|l|l}{\sf I}&{\sf I}xy=xy&{\sf K}{\sf K}y={\sf K}&{\sf K}zvw=zw&w\\ {\sf K}&{\sf K}xy=x&{\sf K}{\sf K}&{\sf K}{\sf K}zvw={\sf K}vw=v&v\end{array}* * Hence \begin{array}[t]{rcl}{\sf I}({\sf K}{\sf K}){\sf I}P_{1}P_{0}&=&P_{0};\\ {\sf K}({\sf K}{\sf K}){\sf I}P_{1}P_{0}&=&P_{1}.\end{array}
Example 2. .
{\scriptsize\begin{array}[]{l}\rule{8.53581pt}{0.0pt}x\rule{39.83385pt}{0.0pt}{x{:=}{\sf K}_{*}}\rule{68.2866pt}{0.0pt}{xyz}\rule{34.1433pt}{0.0pt}{x{:=}{\sf K}_{*},y:=Ku}\end{array}}
\begin{array}[l]{l|l|l|l|l}{\sf I}&{\sf I}x=x&{\sf K}_{*}&{\sf K}_{*}xyz=yz&x\\ {\omega}&{\omega}x=xx&{\sf K}_{*}{\sf K}_{*}={\sf K}{\sf I}{\sf K}_{*}={\sf I}&{\sf I}xyz=xyz&z\end{array}* * Hence \begin{array}[t]{rcl}{\sf I}{\sf K}_{*}{\sf K}_{*}({\sf K}P_{0})P_{1}&=&P_{0};\\ {\omega}{\sf K}_{*}{\sf K}_{*}({\sf K}P_{0})P_{1}&=&P_{1}.\end{array} * *
Example 3. . Consider these as trees:
\textstyle{\lambda xy.x\rule{14.22636pt}{0.0pt}}$$\textstyle{y\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\textstyle{{\sf I}\rule{14.22636pt}{0.0pt}\ignorespaces\ignorespaces\ignorespaces\ignorespaces} * * \textstyle{\lambda xy.x\rule{14.22636pt}{0.0pt}}$$\textstyle{y\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\textstyle{{\omega}\rule{8.53581pt}{0.0pt}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}
In order to separate these, we zoom in on the difference and , via giving respectively, and we know how to separate these by Example 2.
Example 4. . Consider their trees:*
[TABLE]
Again we like to zoom in on the difference and . Dilemma: one cannot make the choose both the left and right branch. Solution: applying the ‘Böhm transformation’ , postpones the choice and yields trees
**
[TABLE]
after which one can zoom in by application to , obtaining and ; then we are back to Example 2. Note that the dilemma was solved essentially by replacing by , enabling to make postponed choices: first (going right), then (going left).
It is clear that one needs to require that the terms have different -nfs, not just -nfs. The terms and are different -nfs, but cannot be separated: and would imply
[TABLE]
from which any equation can be derived, contradicting that the -calculus is consistent.
Corollary 7**.**
For all having a -nf the following are equivalent.
- (1)
For all there exist such that
[TABLE] 2. (2)
* are separable, i.e. for all there exists an such that*
[TABLE] 3. (3)
There exists an such that
[TABLE] 4. (4)
The equation is inconsistent with . 5. (5)
The equation is inconsistent with 121212Dropping the requirement that both have a -nf, the result no longer holds: the equation is consistent with , but not with , as follows from considerations similar to those in [Bar20, Theorem 3.2.24].. 6. (6)
The terms have distinct -nfs.
Proof 4.2**.**
[SB05]**
(1)(2) By (1) there are such that . Take .
(2)(3) Take , for .
(3)(4) From the equation one can by (3) derive , from which one can derive any equation; all derivations using just .
(4)(5) Trivial.
(5)(6) By the assumption that have -nfs and [Bar84], Corollary 15.1.5, it follows that have -nfs. If these were equal, then and hence would be consistent, contradicting (5).
(6)(1) By Theorem 4.
Separability of finite sets of normal forms
Together with his students Böhm generalized Theorem 4 from two to terms.
{defi}
A finite set is called separable if for some
[TABLE]
{thmC}
[[BDCPR79]] Let be terms having different -nfs. Then is separable. One even has for all terms there exist terms such that
[TABLE]
Proof 4.3**.**
For a proof see [BDCPR79] or [Bar84, proof of Corollary 10.4.14.].
Corollary 8**.**
Let be a finite set of terms all having a -nf. Then
[TABLE]
Separability of finite sets of general terms
A characterization of separability for finite , possibly containing terms without normal form, is due to [CDCRdR78], see also [Bar84], Theorem 10.4.13. To taste a flavor of that theorem we give some of its consequences collected in [SB05].
- (1)
The set \left\{\begin{array}[c]{l}\lambda x.x{{{\mathbf{c}}}_{0}}{\Omega},\\ \lambda x.x{{{\mathbf{c}}}_{1}}{\Omega}\end{array}\right\} is separable; so is \left\{\begin{array}[c]{l}\lambda xy.xx{\Omega},\\ \lambda xy.xy{\Omega}\end{array}\right\}. 2. (2)
\left\{\begin{array}[c]{l}\lambda x.x(\lambda y.y{\Omega}),\\ \lambda x.x(\lambda y.y{{{\mathbf{c}}}_{0}})\end{array}\right\} is not separable; neither is \left\{\begin{array}[c]{l}\lambda x.x,\\ \lambda xy.xy\end{array}\right\}. 3. (3)
\left\{\begin{array}[c]{l}\lambda x.x(\lambda y.y{{{\mathbf{c}}}_{0}}{\Omega}(\lambda z.z{\Omega})),\\ \lambda x.x(\lambda y.y{{{\mathbf{c}}}_{1}}{\Omega}(\lambda z.z{{{\mathbf{c}}}_{1}})),\\ \lambda x.x(\lambda y.y{{{\mathbf{c}}}_{1}}{\Omega}(\lambda z.z{{{\mathbf{c}}}_{2}}))\end{array}\right\} is separable. 4. (4)
\left\{\begin{array}[c]{l}\lambda x.x{{{\mathbf{c}}}_{0}}{{{\mathbf{c}}}_{0}}{\Omega},\\ \lambda x.x{{{\mathbf{c}}}_{1}}{\Omega}{{{\mathbf{c}}}_{1}},\\ \lambda x.x{\Omega}{{{\mathbf{c}}}_{2}}{{{\mathbf{c}}}_{2}}\end{array}\right\} is not separable, although each proper subset is.
Separability of infinite sets of general terms
In [SB05] for infinite sets separability is defined and characterized. Here we give a slightly alternative formulation.
{nota}
Let . Write for
[TABLE]
{defi}
Let be an infinite set. Then
- (1)
is called special if there are combinators such that modulo one has
[TABLE] 2. (2)
is called separable if , that is, there is a lambda definable bijection with lambda definable inverse.
Remark 9**.**
If only has a -definable injection, then doesn’t need to be special. Indeed, let be re but not recursive, so that its complement is not re. Define . Then is an injection. For this there is no -definable surjection , for otherwise
[TABLE]
contradicting that is not re.
{defi}
is called an adequate numeral system if there are terms (zero, successor, predecessor, test for zero) such that, writing for , one has
[TABLE]
Proposition 10**.**
*Let be infinite. If is special, then there is a lambda definable bijection . *
Proof 4.4**.**
Let combinators be given as required in Definition 4. Define by primitive recursion
[TABLE]
In () ‘’ stands for ‘the least number such that’, which in this case always exists since is infinite and surjective. That is -definable follows from the existence of : indeed, for one has*
[TABLE]
where is the decidable equality predicate on Church numerals, so that also
[TABLE]
is decidable.
Claim. For all one has
[TABLE]
The claim follows by induction on . Case . By definition .
Case . Assume (induction hypothesis), towards . If , then we are done. Otherwise . For one has which is a subset of by the induction hypothesis. Therefore by definition , and the conclusion holds again. This proves the claim.
By clause () in the definition above is injective. That it is also surjective follows from the claim and the surjectivity of .*
Corollary 11** ([SB05]).**
*Let be infinite. Then the following are equivalent.
- (1)
* is special.* 2. (2)
* is separable. * 3. (3)
* is an adequate numeral system.*
Proof 4.5**.**
(1) (2). If is separable, via and , then by Proposition 10 there exists a -definable that is a bijection. We need to show that has a -definable inverse. This can be defined by
[TABLE]
(2) (3). By the set inherits the structure of an adequate numeral system from .
(3) (1). Let give the structure of an adequate numeral system. Then the computable functions can be -defined w.r.t. the . By primitive recursion on the and numerals, respectively, one can define -definable and satisfying and , making separable.
5. Translating without parsing
Combinatory terms, built-up from with just application, with reduction rules
[TABLE]
suffice to represent arbitrary computations. We write all parenthesis. For example is such a term. It was noticed by Böhm and Dezani that the meaning of such a term can be found by interpreting it symbol by symbol, including the two parentheses. One doesn’t need to parse the combinator to display its tree-like structure. The method also applies to combinatory terms build from different combinators, including for example corresponding to the -term .
{defi}
Define for -terms
[TABLE]
It is easy to see that and are associative modulo -equality of the -calculus; moreover, for one has
[TABLE]
{defi}
Combinatory terms are built up over alphabet by the following context-free grammar
[TABLE]
{defi}
Given its translation into closed terms of the -calculus is defined recursively as follows:
[TABLE]
For this translation the needs to be parsed. For example if , we need to know where the string ends and similarly where starts. The following translation avoids this need for parsing.
{defi}
- (1)
The symbols of are translated into as follows.
[TABLE] 2. (2)
A word in is translated into as follows.
[TABLE]
{propC}
[[BD73]]
- (1)
For all one has . 2. (2)
For all one has .
Proof 5.1**.**
- (1)
Since , we may use induction over terms in . If or , the result holds by definition of . If , then
[TABLE] 2. (2)
*By (1): *
Proposition 5(2) shows that the meaning of can be obtained without parsing.
6. A simple self-evaluator
To one assigns computably a Gödel-number .
{defi}
For its code is defined as the Church numeral corresponding to
[TABLE]
Note that the code of satisfies 1. is in normal form; 2. syntactic operations on are lambda definable on , by the computability of . An evaluator is constructed by Stephen Cole Kleene in [Kle35] such that for all one has
[TABLE]
A technical problem to define and show this is caused by the fact that the lambda terms are inductively defined via open terms containing free variables. But the decoding only holds for closed terms. The way Kleene dealt with this (basically the problem of representing the binding effect of ), was to translate closed -terms first to combinators and then representing these as numerals. The term was reconstructed by McCarthy for the programming language LISP under the name ‘eval’, and baptized in [Rey72] as the ‘meta-circular’ self-interpreter.
During lectures at Radboud University on Kleene’s self-evaluator and constructing this term via the combinators, the student Peter de Bruin came with an improvement. He suggested to use the intuition of denotational semantics of -calculus. First the meaning of an open term (containing possibly free variables) is given, in notation , using a valuation assigning values to the code of a free variable .
{thmC}
[[Kle35]] There is a term such that
[TABLE]
Proof 6.1**.**
(P. de Bruin) By the effectiveness of the Gödel-numbering there exists an satisfying
[TABLE]
where with
[TABLE]
Then one can prove that for with one has
[TABLE]
Therefore
[TABLE]
and one can take .
Corollary 12**.**
The term enumerates the closed -terms
[TABLE]
Remark 13**.**
In [Bar95] it is proved (constructively) that any enumerator of the closed terms is reducing in the following sense.
[TABLE]
The construction of Peter de Bruin inspired [Mog94] to a higher order encoding of -terms, see [PE88], in which a is interpreted by itself.
{defiC}
[[Mog94]] An open lambda term can be interpreted as an open lambda term with the same free variables as follows.
[TABLE]
This can be seen as first using three unspecified constructors as follows
[TABLE]
and then taking
[TABLE]
{thmC}
[[Mog94]] There is an evaluator such that for all
[TABLE]
Proof 6.2**.**
Using Turing’s fixed point combinator one can construct a term such that
[TABLE]
where , and : take . Then by induction on the structure of it follows that .
[TABLE]
Remark 14**.**
- (1)
Using Mogensen’s translation, decoding is possible for all terms possibly containing free variables. On the other hand not all syntactic operations are possible on the coded terms. Equality test for variables is possible for , but not for . 2. (2)
In spite of this, the lambda definability of equality discrimination for coded closed terms is proved in **[Bar01]**. 3. (3)
In **[Mog94]** it is also proved that there is a normalizer acting on coded terms.
*There is a term such that for all
if has a normal form , then ;
if has a no normal form, then has no nf.
Berarducci and Böhm constructed a very simple self-evaluator, based on Mogensen’s construction above, but using different choices for var, app, abs. These are based on unpublished work of Böhm and Piperno, who represented algebraic data structures in such a way that primitive recursive (computable) functions are representable by terms in normal form, avoiding the fixed point operator that was used in the proof of Theorem 13.
{thmC}
[[BB93]] There is a coding of -terms with a short closed normal form as evaluator.
Proof 6.3**.**
Define
[TABLE]
where
[TABLE]
By induction on the structure of we show that .
Case . Then
[TABLE]
Case . Then
[TABLE]
Case . Then
[TABLE]
Therefore for all one has . It follows that is a self-evaluator: for all
[TABLE]
It is a remarkable coincidence that the term abbreviates the name “Kleene, Stephen Cole”, the full name of the inventor of self-evaluation in -calculus. Corrado Böhm was fond of such tricks and had for this and other reasons the nickname ‘il miracolo’.
Coda
At a symposium in honor of Corrado Böhm’s ninety’s birthday, January 2013, at Sapienza University, Rome, the jubilee treated the audience with an open problem. Actually it is more a ‘Koan’ (not precisely stated) than a Problem (with a precisely stated space of answers). But Koans are often the more interesting problems in mathematics and computer science.
Problem/Koan 15** ((C. Böhm, 2013)).**
Given -normal forms , and . By writing and similarly for , these terms can be made unary. Trying to find closed terms from solutions of the equation ? (Define a deed to be a closed nf of the form . The are deeds up to .)
Acknowledgments
The author thanks Marko van Eekelen for explaining him many years ago the method of bootstrapping (Section 1), Mariangiola Dezani for comments on the paper, and Rinus Plasmeijer for discussions about Section 3. The referees provided very useful remarks, improving the paper.
To the family of Corrado Böhm I am grateful for letting me spend wonderful times with them, besides for fully enabling us to enjoy the combinators.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[AM 72] E. Ashcroft and Z. Manna. The translation of goto programs into while programs. In C.V. Freiman, J.E. Griffith, and J.L. Rosenfeld, editors, Proceedings of IFIP Congress 71 , volume 1, pages 250–255, Amsterdam, 1972. North-Holland.
- 2[Bar 75] H. P. Barendregt. Normed uniformly reflexive structures. In Proceedings of the Symposium on Lambda-Calculus and Computer Science Theory , pages 272–286, Berlin, Heidelberg, 1975. Springer-Verlag.
- 3[Bar 84] H. P. Barendregt. The Lambda Calculus: its Syntax and Semantics . North-Holland, revised edition, 1984.
- 4[Bar 95] H. P. Barendregt. Enumerators of lambda terms are reducing constructively. Annals of Pure and Applied Logic , 73:3–9, 1995.
- 5[Bar 96] H. P. Barendregt. Kreisel, lambda calculus, a windmill and a castle , pages 3–14. Peters, Wellesley, Mass., 1996.
- 6[Bar 01] H. P. Barendregt. Discriminating coded lambda terms. In A. Anderson and M. Zeleny, editors, Logic, Meaning and Computation: Essays in Memory of Alonzo Church , pages 275–285. Kluwer, 2001.
- 7[Bar 20] H. P. Barendregt. Some extensional term models for λ 𝜆 \lambda -calculi and combinatory logics . Ph D thesis, Utrecht University, 1971/2020. Kindle Desktop Publishing. Extended re-edition 2020.
- 8[BB 93] A. Berarducci and C. Böhm. A self-interpreter of lambda calculus having a normal form. In E. Börger, G. Jäger, H. Kleine Büning, S. Martini, and M. M. Richter, editors, Computer Science Logic , pages 85–99, Berlin, Heidelberg, 1993. Springer Berlin Heidelberg.
