Gems of Corrado B\"ohm

Henk P. Barendregt

arXiv:1812.02243·cs.LO·June 22, 2023

Gems of Corrado B\"ohm

Henk P. Barendregt

PDF

TL;DR

This paper reviews Corrado B"ohm's key contributions to computing, including self-compiling compilers, structured and functional programming, and foundational models of computability, highlighting their historical and theoretical significance.

Contribution

It presents a comprehensive overview of B"ohm's pioneering ideas and algorithms that have shaped modern programming and computability theory.

Findings

01

Development of a self-compiling compiler

02

Introduction of structured programming eliminating 'goto'

03

Early implementation of functional programming

Abstract

The main scientific heritage of Corrado B\"ohm consists of ideas about computing, concerning concrete algorithms, as well as models of computability. The following will be presented. 1. A compiler that can compile itself. 2. Structured programming, eliminating the 'goto' statement. 3. Functional programming and an early implementation. 4. Separability in {\lambda}-calculus. 5. Compiling combinators without parsing. 6. Self-evaluation in {\lambda}-calculus.

Equations208

{p^{M}} (x) \to \to y .

{p^{M}} (x) \to \to y .

\begin{array}[]{rcll}{[\![{p^{M}}]\!]}(x)&=&y,&\mbox{ if $\{p^{M}\}(x)\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}y$};\\ &=&\uparrow,&\mbox{ if $\{p^{M}\}(x)\!\uparrow$}.\end{array}

\begin{array}[]{rcll}{[\![{p^{M}}]\!]}(x)&=&y,&\mbox{ if $\{p^{M}\}(x)\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}y$};\\ &=&\uparrow,&\mbox{ if $\{p^{M}\}(x)\!\uparrow$}.\end{array}

{p^{M}} (x) \to \to [[p^{M}]] (x) .

{p^{M}} (x) \to \to [[p^{M}]] (x) .

[[C (p^{L_{1}})]]_{L_{2}} = [[p^{L_{1}}]]_{L_{1}} .

[[C (p^{L_{1}})]]_{L_{2}} = [[p^{L_{1}}]]_{L_{1}} .

{C (p^{L})}_{M} (x) \to \to [[p^{L}]]_{L} (x) .

{C (p^{L})}_{M} (x) \to \to [[p^{L}]]_{L} (x) .

{C (p^{L})}_{M} (x) \to \to [[C (p^{L})]]_{M} (x) = [[p^{L}]]_{L} (x) .

{C (p^{L})}_{M} (x) \to \to [[C (p^{L})]]_{M} (x) = [[p^{L}]]_{L} (x) .

[[c^{L_{1}, L_{2}}]]_{L_{2}} = C^{L_{1}} .

[[c^{L_{1}, L_{2}}]]_{L_{2}} = C^{L_{1}} .

{{c^{L}} (p^{L})} (x) \to \to {C (p^{L})} (x) \to \to [[p^{L}]]_{L} (x) .

{{c^{L}} (p^{L})} (x) \to \to {C (p^{L})} (x) \to \to [[p^{L}]]_{L} (x) .

{c^{L}} (p^{L}) \to \to [[c^{L}]]_{M} (p^{L}) = C (p^{L}) .

{c^{L}} (p^{L}) \to \to [[c^{L}]]_{M} (p^{L}) = C (p^{L}) .

\begin{array}[]{rcll}\{\{c^{L}\}(p^{L})\}(x)&\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}&\{C(p^{L})\}(x),&\mbox{ by (1)},\\ &\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}&{[\![{p^{L}}]\!]}(x),&\mbox{ by Proposition \ref{comp.def-prop}}.\hfill\hbox{}\end{array}

\begin{array}[]{rcll}\{\{c^{L}\}(p^{L})\}(x)&\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}&\{C(p^{L})\}(x),&\mbox{ by (1)},\\ &\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}&{[\![{p^{L}}]\!]}(x),&\mbox{ by Proposition \ref{comp.def-prop}}.\hfill\hbox{}\end{array}

{{c^{L}} (p^{L})} (x) \to \to^{1} {C (p^{L})} (x) \to \to^{2} [[p^{L}]] (x) .

{{c^{L}} (p^{L})} (x) \to \to^{1} {C (p^{L})} (x) \to \to^{2} [[p^{L}]] (x) .

\begin{array}[]{rcll}\{\{c_{I}^{L,M}\}(p^{L})\}(x)&\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}&\{C_{I}^{L}(p^{L})\}(x),&\mbox{by {1 of Definition \ref{cph}(1)},}\\ &\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}&{[\![{p^{L}}]\!]}_{L}(x),&\mbox{by 2 of Definition \ref{cph}(1)}.\end{array}

\begin{array}[]{rcll}\{\{c_{I}^{L,M}\}(p^{L})\}(x)&\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}&\{C_{I}^{L}(p^{L})\}(x),&\mbox{by {1 of Definition \ref{cph}(1)},}\\ &\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}&{[\![{p^{L}}]\!]}_{L}(x),&\mbox{by 2 of Definition \ref{cph}(1)}.\end{array}

c_{B}^{L, M} = C_{I}^{L} (c_{B}^{L, L}) \leftarrow \leftarrow {c_{I}^{L, M}} (c_{B}^{L, L}),

c_{B}^{L, M} = C_{I}^{L} (c_{B}^{L, L}) \leftarrow \leftarrow {c_{I}^{L, M}} (c_{B}^{L, L}),

\begin{array}[]{rcll}\{\{c_{B}^{L,M}\}(p^{L})\}(x)&=&\{\{C_{I}^{L}(c_{B}^{L,L})\}(p^{L})\}(x),&\mbox{by definition,}\\ &\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}&\{{[\![{c_{B}^{L,L}}]\!]}_{L}(p^{L})\}(x),&\mbox{Prop.\ \ref{comp.def-prop} applied to {$C_{I}(c_{B}^{L,L})$,}}\\ &=&\{C_{B}^{L}(p^{L})\}(x),&\mbox{as ${[\![{c_{B}^{L,L}}]\!]}_{L}=C_{B}^{L}$ by definition,}\\ &\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}&{[\![{p^{L}}]\!]}_{L}(x),&\mbox{Prop.\ \ref{comp.def-prop} applied to $C_{B}^{L}(p^{L})$.}\end{array}

\begin{array}[]{rcll}\{\{c_{B}^{L,M}\}(p^{L})\}(x)&=&\{\{C_{I}^{L}(c_{B}^{L,L})\}(p^{L})\}(x),&\mbox{by definition,}\\ &\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}&\{{[\![{c_{B}^{L,L}}]\!]}_{L}(p^{L})\}(x),&\mbox{Prop.\ \ref{comp.def-prop} applied to {$C_{I}(c_{B}^{L,L})$,}}\\ &=&\{C_{B}^{L}(p^{L})\}(x),&\mbox{as ${[\![{c_{B}^{L,L}}]\!]}_{L}=C_{B}^{L}$ by definition,}\\ &\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}&{[\![{p^{L}}]\!]}_{L}(x),&\mbox{Prop.\ \ref{comp.def-prop} applied to $C_{B}^{L}(p^{L})$.}\end{array}

c_{B^{'}}^{L, M} = C_{B}^{L} (c_{B}^{L, L}) \leftarrow \leftarrow {c_{B}^{L, M}} (c_{B}^{L, L}) \leftarrow \leftarrow {{c_{I}^{L, M}} (c_{B}^{L, L})} (c_{B}^{L, L}),

c_{B^{'}}^{L, M} = C_{B}^{L} (c_{B}^{L, L}) \leftarrow \leftarrow {c_{B}^{L, M}} (c_{B}^{L, L}) \leftarrow \leftarrow {{c_{I}^{L, M}} (c_{B}^{L, L})} (c_{B}^{L, L}),

\begin{array}[]{rcll}\{\{c^{L,M}_{B^{\prime}}\}(p^{L})\}(x)&=&\{\{C_{B}^{L}(c_{B}^{L,L})\}(p^{L})\}(x),&\mbox{by definition,}\\ &\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}&\{{[\![{c^{L,L}_{B}}]\!]}_{L}(p^{L})\}(x),&\mbox{Prop.\ \ref{comp.def-prop} applied to $C_{B}^{L}(c^{L,L}_{B})$,}\\ &=&\{C_{B}^{L}(p^{L})\}(x),&\mbox{as $C_{B}^{L}={[\![{c_{B}^{L,L}}]\!]}_{L}$ by definition,}\\ &\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}&{[\![{p^{L}}]\!]}(x),&\mbox{Prop.\ \ref{comp.def-prop} applied to $C_{B}^{L}(p^{L})$,}\end{array}

\begin{array}[]{rcll}\{\{c^{L,M}_{B^{\prime}}\}(p^{L})\}(x)&=&\{\{C_{B}^{L}(c_{B}^{L,L})\}(p^{L})\}(x),&\mbox{by definition,}\\ &\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}&\{{[\![{c^{L,L}_{B}}]\!]}_{L}(p^{L})\}(x),&\mbox{Prop.\ \ref{comp.def-prop} applied to $C_{B}^{L}(c^{L,L}_{B})$,}\\ &=&\{C_{B}^{L}(p^{L})\}(x),&\mbox{as $C_{B}^{L}={[\![{c_{B}^{L,L}}]\!]}_{L}$ by definition,}\\ &\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}&{[\![{p^{L}}]\!]}(x),&\mbox{Prop.\ \ref{comp.def-prop} applied to $C_{B}^{L}(p^{L})$,}\end{array}

c_{I}^{L, M} c_{B}^{L, L} c_{B}^{L, L} p^{L} x \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces

c_{I}^{L, M} c_{B}^{L, L} c_{B}^{L, L} p^{L} x \ignorespaces \ignorespaces \ignorespaces \ignorespaces \ignorespaces

{\mathcal{C}}::=L\mid(L,{\bf C}_{1},c,{\bf C}_{2}),\mbox{ where $c{\in}L$ and ${\bf C}_{1},{\bf C}_{2}{\in}{\mathcal{C}}$.}

{\mathcal{C}}::=L\mid(L,{\bf C}_{1},c,{\bf C}_{2}),\mbox{ where $c{\in}L$ and ${\bf C}_{1},{\bf C}_{2}{\in}{\mathcal{C}}$.}

∣ L ∣

∣ L ∣

∣ (L, C_{1}, c, C_{2}) ∣

C_{1}

C_{1}

C_{2}

C_{3}

T_{L}

T_{L}

T_{(L, C_{1}, c, C_{2})}

L

L

T_{C_{1}}

T_{C_{1}}

T_{C_{2}}

T_{C_{3}}

T_{\bf C}=\lx@xy@svg{\hbox{\raise 0.0pt\hbox{\kern 7.80278pt\hbox{\ignorespaces\ignorespaces\ignorespaces\hbox{\vtop{\kern 0.0pt\offinterlineskip\halign{\entry@#!@&&\entry@@#!@\cr&&\\&&\crcr}}}\ignorespaces{\hbox{\kern-3.0pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern 31.80278pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{L\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$}}}}}}}\ignorespaces\ignorespaces\ignorespaces{}\ignorespaces\ignorespaces{\hbox{\lx@xy@drawline@}}\ignorespaces\ignorespaces\ignorespaces{\hbox{\kern 10.12642pt\raise-14.36082pt\hbox{{}\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\hbox{\hbox{\kern 0.0pt\raise-1.50694pt\hbox{$\scriptstyle{c}$}}}\kern 3.0pt}}}}}}\ignorespaces{\hbox{\kern 7.8028pt\raise-30.0318pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\lx@xy@tip{1}\lx@xy@tip{-1}}}}}}\ignorespaces\ignorespaces{\hbox{\lx@xy@drawline@}}\ignorespaces{\hbox{\lx@xy@drawline@}}\ignorespaces\ignorespaces\ignorespaces{}\ignorespaces\ignorespaces{\hbox{\lx@xy@drawsquiggles@}}\ignorespaces{}\ignorespaces\ignorespaces{\hbox{\lx@xy@drawsquiggles@}}\ignorespaces{\hbox{\lx@xy@drawsquiggles@}}{\hbox{\kern 74.00418pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern-7.80278pt\raise-37.73553pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{L_{1}}$}}}}}}}{\hbox{\kern 35.20557pt\raise-37.73553pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern 68.60835pt\raise-37.73553pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{M}$}}}}}}}\ignorespaces}}}}\ignorespaces,\quad T_{{\bf C}^{\prime}}=\lx@xy@svg{\hbox{\raise 0.0pt\hbox{\kern 8.39583pt\hbox{\ignorespaces\ignorespaces\ignorespaces\hbox{\vtop{\kern 0.0pt\offinterlineskip\halign{\entry@#!@&&\entry@@#!@\cr&&\\&&\crcr}}}\ignorespaces{\hbox{\kern-3.0pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern 32.39583pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{L\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$}}}}}}}\ignorespaces\ignorespaces\ignorespaces{}\ignorespaces\ignorespaces{\hbox{\lx@xy@drawline@}}\ignorespaces\ignorespaces\ignorespaces{\hbox{\kern 10.49348pt\raise-14.36082pt\hbox{{}\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\hbox{\hbox{\kern 0.0pt\raise-1.50694pt\hbox{$\scriptstyle{c}$}}}\kern 3.0pt}}}}}}\ignorespaces{\hbox{\kern 8.39583pt\raise-29.56929pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\lx@xy@tip{1}\lx@xy@tip{-1}}}}}}\ignorespaces\ignorespaces{\hbox{\lx@xy@drawline@}}\ignorespaces{\hbox{\lx@xy@drawline@}}\ignorespaces\ignorespaces\ignorespaces{}\ignorespaces\ignorespaces{\hbox{\lx@xy@drawsquiggles@}}\ignorespaces{}\ignorespaces\ignorespaces{\hbox{\lx@xy@drawsquiggles@}}\ignorespaces{\hbox{\lx@xy@drawsquiggles@}}{\hbox{\kern 74.00418pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern-8.39583pt\raise-37.73553pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{M}$}}}}}}}{\hbox{\kern 35.79861pt\raise-37.73553pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern 69.2014pt\raise-37.73553pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{L_{2}}$}}}}}}}\ignorespaces}}}}\ignorespaces,

T_{\bf C}=\lx@xy@svg{\hbox{\raise 0.0pt\hbox{\kern 7.80278pt\hbox{\ignorespaces\ignorespaces\ignorespaces\hbox{\vtop{\kern 0.0pt\offinterlineskip\halign{\entry@#!@&&\entry@@#!@\cr&&\\&&\crcr}}}\ignorespaces{\hbox{\kern-3.0pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern 31.80278pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{L\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$}}}}}}}\ignorespaces\ignorespaces\ignorespaces{}\ignorespaces\ignorespaces{\hbox{\lx@xy@drawline@}}\ignorespaces\ignorespaces\ignorespaces{\hbox{\kern 10.12642pt\raise-14.36082pt\hbox{{}\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\hbox{\hbox{\kern 0.0pt\raise-1.50694pt\hbox{$\scriptstyle{c}$}}}\kern 3.0pt}}}}}}\ignorespaces{\hbox{\kern 7.8028pt\raise-30.0318pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\lx@xy@tip{1}\lx@xy@tip{-1}}}}}}\ignorespaces\ignorespaces{\hbox{\lx@xy@drawline@}}\ignorespaces{\hbox{\lx@xy@drawline@}}\ignorespaces\ignorespaces\ignorespaces{}\ignorespaces\ignorespaces{\hbox{\lx@xy@drawsquiggles@}}\ignorespaces{}\ignorespaces\ignorespaces{\hbox{\lx@xy@drawsquiggles@}}\ignorespaces{\hbox{\lx@xy@drawsquiggles@}}{\hbox{\kern 74.00418pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern-7.80278pt\raise-37.73553pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{L_{1}}$}}}}}}}{\hbox{\kern 35.20557pt\raise-37.73553pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern 68.60835pt\raise-37.73553pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{M}$}}}}}}}\ignorespaces}}}}\ignorespaces,\quad T_{{\bf C}^{\prime}}=\lx@xy@svg{\hbox{\raise 0.0pt\hbox{\kern 8.39583pt\hbox{\ignorespaces\ignorespaces\ignorespaces\hbox{\vtop{\kern 0.0pt\offinterlineskip\halign{\entry@#!@&&\entry@@#!@\cr&&\\&&\crcr}}}\ignorespaces{\hbox{\kern-3.0pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern 32.39583pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{L\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$}}}}}}}\ignorespaces\ignorespaces\ignorespaces{}\ignorespaces\ignorespaces{\hbox{\lx@xy@drawline@}}\ignorespaces\ignorespaces\ignorespaces{\hbox{\kern 10.49348pt\raise-14.36082pt\hbox{{}\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\hbox{\hbox{\kern 0.0pt\raise-1.50694pt\hbox{$\scriptstyle{c}$}}}\kern 3.0pt}}}}}}\ignorespaces{\hbox{\kern 8.39583pt\raise-29.56929pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\lx@xy@tip{1}\lx@xy@tip{-1}}}}}}\ignorespaces\ignorespaces{\hbox{\lx@xy@drawline@}}\ignorespaces{\hbox{\lx@xy@drawline@}}\ignorespaces\ignorespaces\ignorespaces{}\ignorespaces\ignorespaces{\hbox{\lx@xy@drawsquiggles@}}\ignorespaces{}\ignorespaces\ignorespaces{\hbox{\lx@xy@drawsquiggles@}}\ignorespaces{\hbox{\lx@xy@drawsquiggles@}}{\hbox{\kern 74.00418pt\raise 0.0pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern-8.39583pt\raise-37.73553pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{M}$}}}}}}}{\hbox{\kern 35.79861pt\raise-37.73553pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{}$}}}}}}}{\hbox{\kern 69.2014pt\raise-37.73553pt\hbox{\hbox{\kern 0.0pt\raise 0.0pt\hbox{\hbox{\kern 3.0pt\raise 0.0pt\hbox{$\textstyle{L_{2}}$}}}}}}}\ignorespaces}}}}\ignorespaces,

Φ_{L} p x

Φ_{L} p x

Φ_{L, C_{1}, c, C_{2}} p x

Φ_{C} p x = [[p]]_{∣ C ∣} (x) .

Φ_{C} p x = [[p]]_{∣ C ∣} (x) .

\begin{array}[]{rclll}\Phi_{M}p^{M}x&=&{[\![{p^{M}}]\!]}_{M}x&\mathrel{\leftarrow\!\!\!\!\!\leftarrow}\{p^{M}\}x&=p^{M}x.\\ \Phi_{{\bf C}_{1}}p^{L}x&=&{[\![{{[\![{c_{I}}]\!]}_{M}p^{L}}]\!]}_{M}x&\mathrel{\leftarrow\!\!\!\!\!\leftarrow}\{\{c_{I}\}p^{L}\}x&=c_{I}p^{L}x.\\ \Phi_{{\bf C}_{2}}p^{L}x&=&{[\![{{[\![{{[\![{c_{I}}]\!]}_{M}c_{B}}]\!]}_{M}p^{L}}]\!]}_{M}x&\mathrel{\leftarrow\!\!\!\!\!\leftarrow}\{\{\{c_{I}\}c_{B}\}p^{L}\}x&=c_{I}c_{B}p^{L}x.\\ \Phi_{{\bf C}_{3}}p^{L}x&=&{[\![{{[\![{{[\![{{[\![{c_{I}}]\!]}_{M}c_{B}}]\!]}_{M}c_{B}}]\!]}_{M}p^{L}}]\!]}_{M}x&\mathrel{\leftarrow\!\!\!\!\!\leftarrow}\{\{\{\{c_{I}\}c_{B}\}c_{B}\}p^{L}\}x&=c_{I}c_{B}c_{B}p^{L}x.\end{array}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\lmcsdoi

16315 \lmcsheadingLABEL:LastPageDec. 07, 2018Sep. 02, 2020

Gems of Corrado Böhm

Henk Barendregt

Faculty of Science, Radboud University Nijmegen

Box 9010

6500GL Nijmegen, The Netherlands

[email protected]

Abstract.

The main scientific heritage of Corrado Böhm consists of ideas about computing, concerning concrete algorithms, as well as models of computability. The following will be presented. 1. A compiler that can compile itself. 2. Structured programming, eliminating the ‘goto’ statement. 3. Functional programming and an early implementation. 4. Separability in $\lambda$ -calculus. 5. Compiling combinators without parsing. 6. Self-evaluation in $\lambda$ -calculus.

Key words and phrases:

self-compiling, structured programming, functional programming, lambda calculus, combinators, self-evaluation

To the memory of Corrado Böhm (1923-2017)

Introduction

As a tribute to Corrado Böhm this paper presents six brilliant results of his and also discusses some of their later developments. Most of the papers are written by Böhm with co-authors. The result on elimination of the goto, Section 2, is written by Giuseppe Jacopini alone in the joint paper [BJ66], but one may assume that Böhm as supervisor had influenced the research involved, and therefore this result is included here. This paper is written such that computer science freshmen can read and understand it.

1. Self-compilation

In his PhD thesis [Böh54] at the ETH Zürich, Corrado Böhm constructed one of the first higher programming languages $L$ together with a compiler for it. The compiler has the particular feature that it is written in the language $L$ itself. This sounds like magic, but it is not: if a programming language is capable of expressing any computational process, then it should also be able to ‘understand itself’ (i.e. perform the computational task to translate it into machine language). Later this property gave rise to ‘bootstrapping’: dramatically increasing efficiency and reliability of computer programs, that seems as impossible as to pull oneself over a fence by pulling one’s bootstraps111In Europe the hyperbole for impossibility is the story of Baron (von) Münchhausen, who could get himself (and the horse on which he was seated) out of a swamp by pulling up his own hair.. This gave rise to the term ‘booting a computer’. The mechanism will be explained in this section.

1.1. Algorithms, computers, and imperative programming

An algorithm is a recipe to compute an output from a given input. Executing such a recipe basically consists in putting down pebbles222The word ‘pebble’ in Latin is ‘calculus’. in a fixed array of boxes and ‘replacing’ these pebbles step by step. That is, a pebble may be moved from one box to another one, be taken away, or new ones may be added. Such a process is called a calculation or computation. As shown in [Tur37b], all computational tasks, like “What is the square of 29?”, “Put the following list of words in alphabetical order”, or “What does Wikipedia say about the concept ‘bootstrap’?”, can be put in the format of shuffling pebbles in boxes.

This view on computing holds for computations on an abacus, but also for programmed computers. A computer $M$ is a, usually electronic, device with memory, that performs computations. The pebbles are represented in this memory and the shuffling is done by making stepwise changes. A simple conceptual computer is the Turing Machine (TM). It consists of an infinite333Actual computers only have a finite amount of memory. Turing apparently didn’t want to be technology dependent and conceived the Turing Machine with an idealized memory of infinitely many cells. But at any given moment in a computation only finitely many cells contain a 1. tape of discrete cells that can be numbered by the integers ${\mathbb{Z}}=\{\cdots,-2,-1,0,1,2,\cdots\}$ . At every moment in the computation only a finite number of these cells contain information, either a 1 or nothing: the original TM was a 0-bit444In 0-bit machines counting happens in the $2^{0}$ -ary, i.e. unary, system. In modern computers the cells are replaced by registers that contain a sequence of 64 or more bits that can be read or overwritten in parallel; moreover, the registers do not need to be looked up linearly, like on the tape of the TM, but there is fast access to each of them; one speaks of ‘random access memory’ (RAM). machine. The machine can be in one of a finite number of states. For a computational problem the input, coded as a list of the symbols, is written on the tape. There is a read/write (R/W) head positioned on one of the cells of the tape. Depending on the symbol $a$ that is read, and the present state $s$ , one of the following three actions is performed: a (possibly different) symbol $a^{\prime}$ is written on the cell under the R/W-head, a (possibly different) state $s^{\prime}$ is assumed, and finally the head moves $\{R,L,N\}$ ( $R$ : one position to the right, $L$ : one position to the left, $N$ : no moving). When finally no action can be performed any longer, the resulting information on the tape represents the output of the computation. Each Turing Machine is determined by a finite table consisting of 5-tuples like $\langle a,s;a^{\prime},s^{\prime},\{R,L,N\}\rangle$ that determine the changes.

Turing showed that there exists a particular kind of machine, called a universal machine ${\mathcal{U}}$ , that suffices to make arbitrary computations. Such a ${\mathcal{U}}$ is conceptually easy. The set of 5-tuples of a particular machine ${\mathcal{M}}$ is presented as a table $T_{\mathcal{M}}$ ‘in its silicon’. A universal machine ${\mathcal{U}}$ that imitates ${\mathcal{M}}$ , needs this table $T_{\mathcal{M}}$ as extra input in coded form, including the collection of all states of ${\mathcal{M}}$ (that may be more extensive than that of ${\mathcal{U}}$ ) and the present state of ${\mathcal{M}}$ , stored in a dedicated part of the memory as the program (nowadays known as the ‘app’) for ${\mathcal{M}}$ . The instruction table $T_{\mathcal{U}}$ of ${\mathcal{U}}$ stipulates that 1. it has to look in ${\mathcal{T}}_{\mathcal{M}}$ in order to see what is the present state of ${\mathcal{M}}$ , and to know what to do next; and 2. to do this. The possibility of a universal machine provides a model of computation in which a single machine ${\mathcal{M}}$ , using programming language $M=L_{\mathcal{M}}$ , can perform any computational job. The nature of the actions of Turing machines, described in their action tables, is rather imperative: overwrite information, change state, move. For this reason the resulting computational model is called imperative programming.

In this paper we will consider a fixed universal machine ${\mathcal{M}}$ . Around 1950, when Corrado Böhm worked on his PhD, computers were rare. Indeed, in 1954, in a country like the Netherlands there were only three computers (at the Mathematical Center, the Royal Meteorological Institute, and the National Phone Company) and no more were deemed to be necessary! Nowadays a standard car often has on board in the order of $150$ (universal) computers in the form of microprocessor chips.

A program in a given language $M$ for ${\mathcal{M}}$ consists of a sequence of statements in $M$ that the machine ‘understands’: it performs intended changes on data represented in the memory of ${\mathcal{M}}$ . Such programs are denoted by $p=p^{M}$ , the optional superscript indicating that the program is written in the language $M$ .

{defi}

(1)

There is a non-specified set $D$ (for data) consisting of the intended objects on which computations take place. 2. (2)

The process of running program $p^{M}$ on input $x$ in $D$ is denoted by $\{p^{M}\}(x)$ 555Compound expressions like $\{\{c\}(p)\}(x)$ make sense and will be used. But an expressions like $\{q\}(\{p\}(x))$ we will avoid, as one is forced to evaluate first the $\{p\}(x)$ , which may be undefined; therefore even if $\forall y.\{q\}(y)\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}0$ , one doesn’t always have $\{q\}(\{p\}(x))\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}0$ . See [Bar84, Exercise 9.5.13] and [Bar75, Bar96].. If this process terminates with end result $y$ (the output, again in $D$ ), then we write

[TABLE] 3. (3)

It may be the case that $\{p^{M}\}(x)$ doesn’t terminate. Then there is no output, and we write $\{p^{M}\}(x)\!\uparrow$ . 4. (4)

The (operational) semantics of $p^{M}$ is the partial map ${[\![{p^{M}}]\!]}\colon D\rightharpoondown D$ defined as follows.

[TABLE]

For $\{\ \}$ and ${[\![{\ }]\!]}$ , that depend on $M$ , we sometimes write $\{\ \}_{M}$ , ${[\![{\ }]\!]}_{M}$ , respectively.

The difference between ${[\![{p^{M}}]\!]}(x)=y$ and $\{p^{M}\}(x)\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}y$ is that the former is an identity, like $36^{2}=36\times 36$ that holds by definition, whereas the latter requires a computation, like $36\times 36\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}1296$ . The sign ‘ $\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}$ ’ indicates that a computation has to be performed that takes time, consisting of a sequence of a few or more steps that transform information.

Proposition 1.

If $\{p^{M}\}(x)$ terminates, then

[TABLE]

Proof 1.1.

By definition.

1.2. Programming languages and compilers

A human, having to write a correct and efficient program, better does this in an understandable way, rather than in the form of recipes for shuffling pebbles. One can use a programming language $L$ for this, in which computational tasks can be described more intuitively. In [Böh54] an early example of such a language $L$ is constructed.

{defi}

(1)

A programming language $L$ consists of programs $p$ that describe computations according to (2). 2. (2)

$L$ comes with a (denotational) semantic function ${[\![{\ }]\!]}_{L}\colon L\mathrel{\rightarrow}(D\rightharpoondown D)$ . That is, to each $p^{L}{\in}L$ it assigns a (possibly partial) function ${[\![{p^{L}}]\!]}_{L}\colon D\rightharpoondown D$ .

Technically speaking $M$ is also a programming language, the machine language, with its denotational semantics ${[\![{-}]\!]}_{M}$ , by definition equal to the operational one $\{-\}_{M}$ . By contrast other programming languages are called higher programming languages, that are intended to make the construction of programs more easy. When one has a program $p^{L}$ described in a higher programming language $L$ we want to have machine help from a universal machine to obtain from input $x$ the output ${[\![{p^{L}}]\!]}_{L}(x)$ . We succeed if one can translate $p^{L}$ in the ‘right way’ into the machine language $M$ . This translating is called compiling.

{defi}

A function $C\colon L_{1}\mathrel{\rightarrow}L_{2}$ , is called a compiling function if

[TABLE]

In this paper, we will usually consider only compilers into $L_{2}=M$ .

Proposition 2.

If $C\colon L\mathrel{\rightarrow}M$ is a compiling function, then

[TABLE]

Proof 1.2.

One has by Proposition 1 and Definition 1.2

[TABLE]

This shows that an intended computation using a $p^{L}{\in}L$ , intended to compute ${[\![{p^{L}}]\!]}_{L}(x)$ , can in principle be replaced by a computation using a $p^{M}{\in}M$ , for which there is the support from the machine ${\mathcal{M}}$ . We say: the computational task ${[\![{p^{L}}]\!]}_{L}(x)$ becomes executable (by ${\mathcal{M}}$ ). In modern compilers the translation $L\mathrel{\rightarrow}M$ , is often divided in literally hundreds of steps, using many intermediate languages666For example one may have a long series of translations: $L\mathrel{\rightarrow}L_{1}\mathrel{\rightarrow}L_{2}\mathrel{\rightarrow}\cdots\mathrel{\rightarrow}L_{n}\mathrel{\rightarrow}M.$ . For example, the first step is the so called lexing that examines where every meaningful unit starts and ends777Every student of a foreign language has to master this also: a stream of sounds ‘papafumeunepipe’ has to be separated into words as follows ‘papa fume une pipe’; only then one can translate further, into ‘father smokes a pipe’.. At the end of the long translation process one arrives at the language $M$ . No need for further translation occurs: in ${\mathcal{M}}$ the programs in machine language are run by the laws of physics (electrical engineering).

Compiling functions $C\colon L_{1}\mathrel{\rightarrow}L_{2}$ are notably useful if the translated program $C(p^{L_{1}})$ in $L_{2}$ in turn is executable. Translating is a computational task and in principle determining $C(p^{L})$ can be done by hand. But since many programs, also in a higher order programming language, may consist of several million instructions, the computational task of compiling much better be performed by a machine as well. A program that performs this translation is called a compiler. If such an automated translation process is of any use, the compiler needs to be written either in machine language $M$ , or in another language $L$ for which there is already another compiler from $L$ to $M$ .

{defi}

Let $C^{L_{1}}\colon L_{1}\mathrel{\rightarrow}M$ be a compiling function. A compiler for $C^{L_{1}}$ written in language $L_{2}$ is a program $c^{L_{1},L_{2}}$ such that

[TABLE]

This is useful only if programs in $L_{2}$ are also executable. This is the case if $L_{2}=M$ or if there is already a compiler from $L_{2}$ into $M$ . Two cases will be important in this paper. (1.) $L_{2}=M$ and (2.) $L_{2}=L_{1}$ .

1.3. Compilers written in machine language $M$

First consider a compiler $c^{L}\colon L\mathrel{\rightarrow}M$ written in machine language $M$ .

Proposition 3.

Let $c^{L}\colon L\mathrel{\rightarrow}M$ be a compiler for a compiling function $C$ .

(1)

For all programs $p^{L}$ written in $M$ one has $\{c^{L}\}_{M}(p^{L})\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}C(p^{L}).$ 2. (2)

A computational job ${[\![{p^{L}}]\!]}_{L}(x)$ can be fully automated as follows.

[TABLE]

Proof 1.3.

(1)

By Definition 1.2 we have ${[\![{c^{L}}]\!]}_{M}=C$ . Hence by Proposition 1

[TABLE] 2. (2)

It follows that

[TABLE]

{defi}

Let $c^{L}\colon L\mathrel{\rightarrow}M$ be a compiler written in $M$ .

(1)

By Proposition 3(2) there are two computation phases towards ${[\![{p^{L}}]\!]}(x)$ :

[TABLE]

The first computation 1, that is $\{c^{L}\}(p^{L})\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}C(p^{L})$ , takes place in a time interval that is called compile-time; the second computation 2, that is $\{C(p^{L})\}(x)\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}{[\![{p^{L}}]\!]}(x)$ , takes place in a time-interval that is called run-time. 2. (2)

If for programs $p^{L}$ and inputs $x$ (that interest us) the run-time $\{C(p^{L})\}(x)\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}{[\![{p^{L}}]\!]}(x)$ is short (for our purposes), then the compiler $c^{L}$ is said to produce efficient code. Note that this pragmatic definition depends only on the compiling function $C={[\![{c^{L}}]\!]}$ , and not on its program, the compiler itself. 3. (3)

If for programs $p^{L}$ (that interest us) the compile-time is short (for our purposes), then the compiler is said to be fast. Note that this notion does depend on the compiler $c^{L}$ , and not on the compiling function $C={[\![{c^{L}}]\!]}$ .

Proposition 4.

For a programming language $L$ , in which every program $p^{L}$ is a sequence of statements consisting of a computable step, there exists a simple compiler $c_{I}^{L,M}\colon L\mathrel{\rightarrow}M$ written in $M$ for a compiling function $C_{I}^{L}$ , mimicking the steps in $L$ as steps in $M$ . Such a compiler is called a (simple) interpreter.

Proof 1.4 (Sketch).

Let $p^{L}=s_{1};s_{2};\ldots;s_{n}$ . Define $C_{I}^{L}(p^{L})=I(s_{1});I(s_{2});\ldots;I(s_{n}),$ where $I(s)$ mimics the statement $s$ by a (small) program in $M$ .

For complex computational problems using a large program both the compile-time and run-time consume considerable amounts of time. Often these are bottlenecks for the feasibility of executing a program. Moreover, interpreters usually produce less efficient code than compilers, for reasons to be discussed next.

1.4. Compilers written in higher programming languages

Now we consider the task of writing a compiler $c=c^{L,M}\colon L\mathrel{\rightarrow}M$ . A compiler more complex than a simple interpreter is able to look at the input program $p^{L}$ in its totality and can ‘reflect’ (act) on it, enabling optimizations for the run-time of the resulting code $p^{M}$ . Such a compiler improves efficiency888Software engineering studies ways to develop new versions of programs and compilers, in order to improve time performance and also to correct bugs (errors)., using the power and flexibility of $L$ . With the right effort a compiler can be developed that produces efficient code, so that to use such a compiler the run-time performance of the translated programs are optimized. This doesn’t apply to the compile-time if compiler $c$ is written in $M$ , for which it is hard to achieve optimizations.

In his PhD thesis (1951) of just 50 pages Corrado Böhm designed a programming language $L$ and constructed a compiler $c_{B}=c_{B}^{L,L}$ , in $L$ itself. This later made bootstrapping possible: producing not only efficient programs, but also making the compilation process itself efficient. We will explain how this is achieved. Suppose one has a compiler $c_{B}^{L,L}{\in}L$ that produces efficient code (efficiently running programs). Here ‘efficient’ is used in a non-technical intuitive sense. In order to run $c_{B}^{L,L}$ one needs a simple interpreter $c_{I}^{L,M}\colon L\mathrel{\rightarrow}M$ , written in $M$ . Now we will describe three ways of computing ${[\![{p^{L}}]\!]}_{L}(x)$ , that is, finding the intended result that program $p^{L}{\in}L$ has acting on input $x$ .

1. Computing ${[\![{p^{L}}]\!]}_{L}(x)$ using the simple interpreter $c_{I}^{L,M}$ :

[TABLE]

This has both inefficient compile-time and run-time.

2. Better efficiency using $c_{B}^{L,L}$ , run by the interpreter. Define $c_{B}^{L,M}=C_{I}^{L}(c_{B}^{L,L})$ , the interpreter applied to the compiler written in $L$ . This can be precompiled

[TABLE]

as the code of $C_{B}^{L}$ in the sense that ${[\![{c_{B}^{L,M}}]\!]}_{M}=C_{B}^{L}$ . One now has

[TABLE]

Computing $c_{B}^{L,M}$ is a one time job and, as the result can be stored, it doesn’t count in measuring efficiency. The first computation $\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}$ counts as the compile time of $c_{B}^{L,M}$ . But it is also the run-time of $c_{I}^{L,M}$ (with compiling function $C_{I}^{L}$ ) and doesn’t need to be efficient. The second computation $\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}$ is the run time of $c^{L}_{B}$ (with compiling function $C_{B}^{L}$ ) and was assumed to be efficient. Therefore this computation has an efficient run-time, but not necessarily an efficient compile-time.

Best efficiency using $c_{B}^{L,L}$ : define $c_{B^{\prime}}^{L,M}=C_{B}^{L}(c_{B}^{L,L})$ , the compiler applied to itself. This can be precompiled as follows.

[TABLE]

just requiring a one time computation. Then again ${[\![{c_{B^{\prime}}^{L,M}}]\!]}_{M}=C_{B}^{L}$ , but now

[TABLE]

with both efficient compile and run-time, as both codes have been generated by $C_{B}^{L}$ .

[TABLE] $\textstyle{c_{I}^{L,M}c_{B}^{L,L}c_{B}^{L,L}p^{L}x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\rule{0.0pt}{18.00005pt}}$$\scriptstyle{1.2}$$\textstyle{{[\![{c_{I}^{L,M}}]\!]}_{M}(c_{B}^{L,L})p^{L}x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{1.6}$$\textstyle{c_{I}^{L,M}c_{B}^{L,L}p^{L}x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{1.2}$$\textstyle{C_{B}^{L}(c_{B}^{L,L})p^{L}x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{1.5}$$\textstyle{{[\![{c_{I}^{L,M}}]\!]}_{M}(c_{B}^{L,L})p^{L}x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{1.6}$$\textstyle{{[\![{c_{B}^{L,L}}]\!]}_{L}(c_{B}^{L,L})p^{L}x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{1.6}$$\textstyle{{c_{I}^{L,M}p^{L}x}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{\rm slow\ compiling\ 1.2\hskip 35.70012pt}$$\textstyle{\underline{C_{I}^{L}(c_{B}^{L,L})}p^{L}x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{\rm slow\ compiling\ 1.5\hskip 35.70012pt}$$\textstyle{\underline{C_{B}^{L}(c_{B}^{L,L})}p^{L}x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{\rm\hskip 35.00008pt1.5\ efficient\ compiling}$$\textstyle{{[\![{c_{I}^{L,M}}]\!]}_{M}(p^{L})x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{1.6}$$\textstyle{{[\![{c_{B}^{L,L}}]\!]}_{L}(p^{L})x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{1.6}$$\textstyle{C_{I}^{L}(p^{L})x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{\rm slow\ running\ 1.5\hskip 32.2001pt}$$\textstyle{C_{B}^{L}(p^{L})x\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{\rm\hskip 39.20012pt1.5\ efficient\ running}$$\textstyle{{[\![{p^{L}}]\!]}_{L}(x)}$

Figure 1. Bootstrapping: precompiled $c_{B}^{L,M}:=C_{I}^{L}(c_{B}^{L,L})$ , $c_{B^{\prime}}^{L,M}:=C_{B}^{L}(c_{B}^{L,L})$ provide efficient run time alone, or both run time and compile time, respectively.

In the language of combinatory logic, so much admired by Corrado Böhm, one writes $p\cdot x$ , or simply $px$ , for $\{p\}_{M}(x)$ , and $cpx$ for $(cp)x$ , etcetera (association to the left). Then the three ways of compiling and computing a job ${[\![{p^{L}}]\!]}_{L}(x)$ can be rendered as in Figure 1. The underlined expressions denote the codes of the Böhm compiler $c_{B}^{L,L}$ that are obtained by precompilation, respectively using the interpreter and using itself. So in the steps above these do not require time. This bootstrapping process wasn’t discussed in Böhm’s PhD thesis, but it was made possible by his invention and implementation of self-compilation. In Figure 1.4 the bootstrap process is presented in a slightly different way.

After having obtained his PhD in Zürich, Böhm did succeed registering a patent on compilers. But, unexpectedly, a few years later (1955) IBM came with its FORTRAN compiler. It turned out that Böhm’s patent was valid only in Switzerland!

1.5. Compiler configurations

In this section we treat compilers in greater generality, translating a language $L_{1}$ into $L_{2}$ . Only one machine $M$ is used for the translation, but this easily can be generalized. We settle the question whether it is necessary to have self-compiling, in order to make compile-time and run-time both efficient.

{defi}

(1)

We define the language ${\mathcal{C}}$ of compiler configurations by the following context free grammar.

[TABLE]

Actually $L$ is a symbol $\underline{L}$ for a language $L$ , but we identify the two. 2. (2)

Let ${\bf C}{\in}{\mathcal{C}}$ . The language of ${\bf C}$ , in notation $|{\bf C}|$ , is defined as follows.

[TABLE] 3. (3)

Correctness of ${\bf C}{\in}{\mathcal{C}}$ is defined as follows.

$L$ is correct;

$(L,{\bf C}_{1},c,{\bf C}_{2})$ is correct if $c$ is a program in programming language $|{\bf C}_{2}|$ ,

${\bf C}_{1},\,{\bf C}_{2}$ are correct and

${[\![{c}]\!]}_{|{\bf C}_{2}|}\colon L\rightarrow|{\bf C}_{1}|$ is a compiling function.

{exa}

The three situations in Subsection 1.4 can be described as compiler configurations. We use $c_{0}$ and $c_{B}$ instead of $c_{I}^{L,M}$ and $c_{B}^{L,L}$ , respectively.

[TABLE]

{defi}

A compiler configuration ${\bf C}$ can be drawn as a labeled tree $T_{\bf C}$ .

[TABLE]

Compiler configurations and their trees are more convenient to use than the more rigid T-diagrams introduced in [MHW70], since there is more flexibility to draw languages that still need to be translated. For example, ${\bf C}_{3}$ is the compiler configuration employed by Böhm and its tree explains well the magic trick.

{defi}

A compiler configuration ${\bf C}$ is inductively defined to be executable as follows.

[TABLE]

{exa}

(1)

The three compiler configurations ${\bf C}_{1},{\bf C}_{2},{\bf C}_{3}$ considered before are executable.

[TABLE] 2. (2)

The following compiler configurations, drawn as trees, are not executable:

[TABLE]

because an evaluation function for neither $L_{1}$ nor $L_{2}$ is given.

{defi}

To each ${\bf C}{\in}{\mathcal{C}}$ we assign a function that maps a program $p$ and value $x$ to a value $\Phi_{\bf C}(p)(x)$ , also written $\Phi_{{\bf C}}px$ .

[TABLE]

Exercise 1.5.

For all correct and executable ${\bf C}{\in}{\mathcal{C}}$ , $p{\in}|{\bf C}|$ , $x{\in}D$ one has

[TABLE]

{exa}

In the following evaluations we leave out parenthesis, like in lambda calculus and combinatory logic.

[TABLE]

Do we absolutely need self-compilation in order to obtain efficient compilation? The answer is negative. Suppose one has the following:

(1)

a compiler $c_{1}^{L,L_{1}}:L\mathrel{\rightarrow}M$ , producing fast code, written in $L_{1}$ ; 2. (2)

a compiler $c_{2}^{L_{1},L_{2}}:L_{1}\mathrel{\rightarrow}M$ , producing fast code written in $L_{2}$ ; 3. (3)

a simple interpreter $c_{I}^{L_{2},M}:L_{2}\mathrel{\rightarrow}M$ , written in $M$ .

Then one can form the following correct and executable compiler configuration:

[TABLE]

with tree

[TABLE]

Again one obtains a compiler with fast compile-time that produces efficient code

[TABLE]

In the magic trick of Böhm, compiler (3) in Subsection 1.4 above, he took $L=L_{1}=L_{2}$ and $c_{1}^{L,L_{1}}=c_{2}^{L_{1},L_{2}}=c_{B}^{L,L}$ . This saves work: only one language and one compiler need to be developed.

2. Structured programming

In a Turing machine transition a state can be followed by any other state. Therefore many programming languages naturally contain the ‘goto’ statement. When these are used in a mindless way, the meaning of a program is not obvious, hence its correctness is much more difficult to warrant. The first half of [BJ66] is dedicated to eliminate goto statements, as a first step towards structured programs. That part of the paper is stated to be written by Jacopini, but I think we may suppose that Böhm, the supervisor of Jacopini, has contributed to it.

2.1. Imperative programming

The Universal Turing Machine, or an improved version, immediately gives rise to a language with goto statements: the machine, being in state $s_{1}$ changes (under the right conditions) into state $s_{2}$ . This is expressed by a statement very much like a 5-tuple of a Turing Machine $\langle{\tt 1,s_{1},0,s_{2},N}\rangle$ , that in the presence of named registers looks like

[TABLE]

Here the meaning is as follows: the machine checks whether the content of register $x$ equals 1 and then it overwrites the 1 by a 0 as the content of register x, after which it jumps to state ${\tt s_{2}}$ . In the presence of addressable registers like ${\tt x}$ , there is no longer a need to use the small step local movements indicated by $\{{\tt L,R,N}\}$ . A more extended example is the following.

[TABLE]

Apart from branching, leading naturally to a flow-chart as a representation of such a program, we also see the for imperative programming typical statement ${\tt y:=y+1}$ , meaning that the content of register $y$ is overwritten by the old content augmented by one. Many such components can form nice-looking but hard to understand diagrams. One can imagine that the idea arose to create more understandable diagrams and as a first step to eliminate the goto statements.

2.2. Eliminating the ‘goto’

In this subsection it is shown that the result of eliminating the go to statement can be seen in the light of Kleene’s analysis of computability, as was pointed out in [Coo67] and also in [Har80].

Theorem 5 (Kleene Normal Form Theorem).

There are functions $U,T$ that are primitive computable such that every computable function $f$ has a code number $e$ such that for all $\vec{x}{\in}{\rm Nature}$ one has

[TABLE]

If $P$ is a predicate on ${\rm Nature}$ , then $\mu z.P(z)$ denotes the least number $z{\in}{\rm Nature}$ such that $P(z)$ , if this $z$ exists, otherwise the expression is undefined. In (NFT) it is assumed that for all $x$ there exists a $z$ such that $T(e,x,z)$ holds999The formula (NFT) also holds for partial functions $f$ , in which case $f(\vec{x})\uparrow$ iff $\forall z.T(e,\vec{x},z)\not=0$ ..

Proof 2.1 (Sketch).

The value of the function $f(\vec{x})=y$ can be computed by the Universal Turing Machine ${\mathcal{U}}$ using, say, $e$ as program. Then there is a computation

[TABLE]

where ${\tt input}=(e,\vec{x})$ , ‘ ${\tt input,s_{0},p_{0}}$ ’ is the first configuration, ‘ ${\tt output,s_{0},p_{0}}$ ’ is the last one that is terminating, and ${\tt output}=y$ . Furthermore, ${T}$ is the characteristic function ( $=0$ when true, $=1$ when false) of the primitive computable predicate $P(e,\vec{x},z)$ , that holds if $z$ is (the code of) the computation (comp). After a search (by $\mu$ ) for this (coded sequence) $z$ , the $y={\tt output}$ is easily obtainable from it, which is done by the primitive computable function $U$ .

{thmC}

[[BJ66]] A program built up from statements of the form

$\left.\begin{tabular}[]{l}x:=x+1\\ x:=x-1\\ if B, then$ S_{1} $else$ S_{2} $\\ goto q\end{tabular}\right\}L_{1}$

can be replaced by an equivalent one built up from statements of the form

$\rule{0.0pt}{18.99995pt}\left.\begin{tabular}[]{l}x:=x+1\\ x:=x-1\\ if B, then$ S_{1} $else$ S_{2} $\\ for k:=0 to n do A(k)\\ while x>0 do A(x)\end{tabular}\right\}L_{2}$

Proof 2.2 (Sketch).

A function $f$ with program from $L_{1}$ will be computable by the universal Turing Machine by program, say, $e$ . Therefore by Theorem 5 one has $f(\vec{x})=U(\mu z.T(e,\vec{x},z)=0)$ . The functions $U,T$ are primitive computable, hence expressible by the ‘ ${\tt for}$ ’ statements. Only for the $\mu$ a ${\tt while}$ statement is needed. (Actually this happens only a single time.)

Corollary 6 (Folk Theorem).

Programs in $L_{1}$ can be replaced by an equivalent one in $L_{2}$ using the while construct only a single time.

Proof 2.3.

By the parenthetical remark in the proof of 2.1.

2.3. Evaluation

After the goto was shown to be eliminable, in Dijkstra’s note [Dij68] a polemics was started ‘goto statement considered harmful’. In the book [DDH72] structured programming was turned into an art. In [Knu74] it is argued that eliminating the goto as in the above proof of Theorem 2.1 may produce unstructured programs, unrelated to the original program. The original proof in [BJ66] does preserve the structure of the program in a better way. See [Mil72] for a discussion. An even better way to eliminate the goto statements, while preserving the structure of a program, is described in [AM72]. An example of a program in which a goto statement does improve its structure is also given in [Knu74].

In [Har80] the paper [BJ66] was taken as an example of how a ‘Folk Theorem’ appears. The result attributed to these authors often is Corollary 6, rather than Theorem 2.1 itself.

As remarked in [BJ66] it seems necessary to use an extra variable to obtain a program without a goto, but the authors couldn’t find a proof of this conjecture. A proof was given in [KF71], also in [AM72] and in [KT08].

Although the Böhm-Jacopini result started a discussion towards structured programming, a new idea was needed to obtain even better structured programs. As we will see in the next section, actually it was an old idea: functional programming based on lambda calculus.

3. Functional programming and the CUCH machine

It was Wolf Gross, colleague of Corrado Böhm, who introduced the latter to functional programming based on type-free lambda calculus, in which unlimited self-application is possible. As can be imagined, knowing the construction of a self-applicative compiler, it had a deep impact on the sequel of Böhm’s professional life. We restrict ourselves and give some historical and conceptual background.

3.1. Functional programming

Alonzo Church introduced lambda calculus as a way to mathematically characterize the intuitive notion of computability. I seem to remember that he told me the following story. Church’s thesis supervisor, Oswald Veblen, gave him the problem to compute the Betti numbers of an algebraic surface given by a polynomial equation. Church did not succeed in this task and was stuck developing his PhD thesis. He then did what other mathematicians do in similar circumstances: solve a different but related problem. Church wondered what the notion ‘computable’ actually means. Perhaps determining the Betti number of a surface from its description is not a computable task.

Church then introduced a formal system for mathematical deduction and computation [Chu32, Chu33]. In [KR35] his students Kleene and Rosser found an inconsistency101010The proof of a contradiction in Church’s system was beautifully simplified in [Cur42]. in Church’s original system. In [Chu36] the system was stripped from the deductive part obtaining the (pure) lambda calculus, which turned out to be provably consistent [CR36]. See [Bar84] for an extensive exposition of the lambda calculus.

To formally define the notion of computability, Church introduced numerals ${{{\mathbf{c}}}_{n}}$ representing natural numbers $n$ as lambda terms. Rosser found ways to add, multiply and exponentiate: that is, he found terms $A_{+},A_{\times},A_{\sf exp}$ such that $A_{+}{{{\mathbf{c}}}_{n}}{{{\mathbf{c}}}_{m}}\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}{{{\mathbf{c}}}_{n+m}}$ , and similarly for multiplication and exponentiation. This way these three functions were seen to be lambda definable. Here ‘ $\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}$ ’ denotes many-step rewriting, the transitive reflexive closure of one-step rewriting ‘ $\mathrel{\rightarrow}$ ’ introduced below. At first neither Church nor his students could find a way to lambda define the predecessor function. At the dentist’s office Kleene did see how to simulate recursion by iteration and could in that way construct a term lambda defining the predecessor function, [Cro75]. (I believe Kleene told me it was under the influence of laughing gas, ${\rm N_{2}O}$ , used as anesthetic.) When Church saw that result he stated “Then all intuitively computable functions must be lambda definable.” That was the first formulation of Church’s thesis and the functional model of computation was born. At the same time Church gave an example of a function that was non-computable in this model.

In [Tur37a] it was proved that the imperative and functional models of computation have the same power: they can compute exactly the same partial functions, on say the natural numbers. The way these computations are performed, however, differs considerably. In both cases computations traverse a sequence of configurations, starting essentially from the input leading to the output. But here the common ground ends.

3.2. Comparing imperative and functional programming

In functional programming the argument(s) $A$ (or $\vec{A}\,$ ) for a computation in the form of a function $F$ that has to be applied to them form one single expression $FA$ (respectively $F\vec{A}\,$ ). Such expressions are subject to rewriting. If the expression cannot be rewritten any further, then the so called normal form has been reached and this is the intended output. The intermediate results all have the same meaning as the original expression and as the output. A basic example of this is

[TABLE]

where $(\lambda x.x^{2}+2)$ is the function $x\mapsto x^{2}+1$ that assigns to $x$ the value $x^{2}+1$ . In more complex expressions there is a choice of how to rewrite, that is, which subexpression to choose as focus of attention for elementary steps as above. For example not all choices will lead to a normal form. There are reduction strategies that always will find a normal form if it exists. Normal forms, if they are reached, are unique, the result is independent of choices how to rewrite. However performance, both time and space, is sensitive to the steps employed.

In the imperative model a computation the configurations at each moment of a computation sequence of a Turing Machine $M$ consist of the momentaneous memory content on the tape, the state of $M$ , and position of its head: ${\tt(t,s,p)}$ . Each terminating computation runs as follows:

[TABLE]

where ${\tt s_{h}}$ is a halting state (and ${\tt p_{h}}$ is irrelevant). The transitions ${\tt\rightarrow_{M}}$ depend on the set of instructions of the Turing Machine ${M}$ . In the case of non-termination the configurations never reach one with a terminal state. This description already shows that, wanting to combine Turing Machines to form one that is performing a more complex task, requires some choices of e.g. making the final state of the first machine fit with the initial one of the second machine.

In the functional model of computation the sequence of configurations is as follows:

[TABLE]

All of these configurations are $\lambda$ -terms and the transitions $\mathrel{\mathrel{\rightarrow}_{\beta}}$ are according to the single $\beta$ -rule of reduction, which is quite different. In order to make a more fair comparison between the imperative and functional computation, one could change (IP) and denote it as

[TABLE]

where $c$ is the code (program) that makes the universal machine ${\mathcal{U}}$ imitate the machine $M$ . This makes (IP*′*) superficially similar to (FP), that nevertheless is superior.

Advantages of functional programming

In the sequence (FP) the expressions are words in a language more complex than the simple strings in (IP) or (IP*′*).

(i)

The $\lambda$ -terms expressing functional programs have the possibility of making abstraction upon abstraction, arbitrarily often. This means that ‘components’ of functions can be also functions (of functions), enabling flexible procedures. 2. (ii)

In FP there is no mention of state and position, hence there is no need to deal with the bureaucracy of these when combining programs. Hence FP has easy compositionality. 3. (iii)

In the sequence (FP) the meaning of each configuration remains the same, from the first to the last expression. This can be seen clearly in the sequence (1) above.

Features (i) and (ii) of functional programs makes them transparent and compact. Feature (iii) makes it easier to prove them correct: reasoning with mathematical induction, substitution and abstraction often suffice; no need to learn new logical formalisms that are used to analyze imperative programs. It can be expected that FP will become more and more important. The lack of side-effects makes it more easy to make parallel versions of programs.

Implementations of functional programming

Functional Programming has been developed much more slowly than Imperative Programming. The reason is that imperative programs can be implemented rather directly on a Turing Machine or modern computer. This is not the case for functional programs. Attempts to construct specialized hardware for Functional Programming have not been successful. But compilers from functional languages into ordinary CPU’s using imperative programs have been successfully developed.

One of the early examples is the SECD machine in [Lan64], soon followed by work on the CUCH machine, [BG66], [Böh66]. After fifty years of research on the use and implementation of functional programming the field has come of age. There exist fast compilers producing efficient code. One can focus on the mathematical definition of the functions involved and the correctness of these can be proved with relatively simple tools, like substitution, abstraction and induction. A functional program is automatically structured. There are for example no ‘goto’ statements. See [BMP13] for a short description, [Hug89] for an extensive motivation, and [PJ87] for implementing functional programming languages.

Challenges for functional programming

There are two main challenges for FP. 1. The lack of state makes writing code for input/output more complex. 2. The evaluation result, the output, doesn’t depend on the way reduction takes place, but it is not always easy to reason about space and time efficiency. These issues are beyond the scope of this paper111111Well known functional languages are LISP (later called Lisp), [MAE*+*62] (with many modern versions starting with Scheme [Sch]), and ML [MTHM90] (with modern version OCaml [OCa]). ML is loosely characterized as ‘Lisp with types’ coming from the simply types lambda calculus, see [Chu40, Cur34], with a rich mathematical structure [BDS13], Part I. However, Lisp and ML are not pure functional programming languages, in that they have assignment statements that can be used for input and output, making it also possible to write unstructured programs. In the pure functional languages, Haskell [Has] and Clean [Cle], at present the most developed ones, the I/O problem is solved by respectively monads and uniqueness typing. But using these features, in both cases it is still possible to write incomprehensible code when dealing with I/O..

4. Separability in $\lambda$ -calculus

A mathematician is interested in numbers, not because these may represent the amount of money in one’s bank account (almost offensive to mention), but for their properties definable from the basic arithmetical operations $+$ and $\times$ , such as primality. Such a love for numbers is not shared by most people. In the same way Corrado Böhm became interested in $\lambda$ -terms, not because they represent programs that one can sell, but for their properties definable from the basic lambda calculus operations: application and abstraction. This is somewhat different from another form of fascination, that of Donald Knuth for imperative programs that is obvious from his volumes [Knu18], driven by the challenge to write clear, elegant, and efficient algorithms that perform relevant computational tasks. We assume elementary knowledge of lambda calculus and recall the following notations.

{nota}

(1)

The set of all lambda terms is denoted by $\Lambda$ . The set of free variables of $M{\in}\Lambda$ is denoted by $\mathrm{FV}(M)$ . The set $\Lambda^{o}=\{M{\in}\Lambda\mid\mathrm{FV}(M)=\emptyset\}$ consists of the closed lambda terms without free variables, like $\lambda x.x,\lambda xy.x$ , but not $\lambda xy.z$ . 2. (2)

‘ $\equiv$ ’ denotes equality up to renaming bound variables, e.g. $\lambda x.x\equiv\lambda y.y$ . 3. (3)

‘=’ denotes $\beta$ -convertibility on $\lambda$ -terms, often denoted by ‘ $=_{\beta}$ ’ to be explicit. 4. (4)

$=_{\beta}$ is generated by $\beta$ -reduction $\mathrel{\mathrel{\rightarrow}_{\beta}}$ , as in $(\lambda x.M)N\mathrel{\mathrel{\rightarrow}_{\beta}}M[x:=N]$ . 5. (5)

$=_{\eta}$ is generated by $\eta$ -reduction $\mathrel{\mathrel{\rightarrow}_{\eta}}$ , as in $\lambda x.Mx\mathrel{\mathrel{\rightarrow}_{\eta}}M$ . 6. (6)

$M{\in}\Lambda$ is in $\beta(\eta)$ normal form ( $\beta(\eta)$ -nf) if no $\mathrel{\mathrel{\rightarrow}_{\beta}}$ (nor $\mathrel{\mathrel{\rightarrow}_{\eta}}$ ) step is possible. 7. (7)

For $M_{1},\ldots,M_{n}{\in}\Lambda$ write $\langle M_{1},\ldots,M_{n}\rangle\mathbin{{\triangleq}}\lambda z.zM_{1}\cdots M_{n}$ , with $z$ a fresh variable, i.e. $z\notin\mathrm{FV}(M_{1}\cdots M_{n})$ . 8. (8)

Write ${\sf U}^{n}_{k}\mathbin{{\triangleq}}\lambda x_{1}\cdots x_{n}.x_{k}$ . Note that $\langle M_{1},\ldots,M_{n}\rangle{\sf U}^{n}_{k}=_{\beta}M_{k}$ , for $1\leq k\leq n$ . 9. (9)

Write

[TABLE]

Separability of two normal forms

{defi}

Terms $M_{0},M_{1}{\in}\Lambda^{o}$ are called separable if for all $P_{0},P_{1}{\in}\Lambda^{o}$ there exists an $F{\in}\lambda$ such that

[TABLE]

This is equivalent to requiring that there is a lambda definable bijection

[TABLE]

with lambda definable inverse, in which case we write $\{M_{0},M_{1}\}=_{1}\{{{{\mathbf{c}}}_{0}},{{{\mathbf{c}}}_{1}}\}$ .

In result 7 the principal step was proved in [Böh68] with the following result.

{thmC}

[[Böh68]] Let $M_{0},M_{1}{\in}\Lambda^{o}$ be two different $\lambda$ -terms in ${\beta\eta}$ -nf. Then for all $P_{0},P_{1}{\in}\Lambda^{o}$ there exist $\vec{N}{\in}\Lambda^{o}$ such that

[TABLE]

Proof 4.1 (Sketch).

A full proof (in English) is in [Bar84, Theorem 10.4.2] and an intuitive proof with applications in [GPDC09]. Idea: give the $M_{0},M_{1}$ arguments separating the two. As we do not know in advance which arguments will work, we use variables as unknowns and substitute for them later. It suffices to reach two distinct variables, as they can be replaced by $P_{0},P_{1}$ . We present some examples.

*Example 1. $M_{0}\equiv{\sf I},\,M_{1}\equiv{\sf K}$ . *

${\scriptsize\begin{array}[]{l}\rule{8.53581pt}{0.0pt}xy\rule{39.83385pt}{0.0pt}{x{:=}{\sf K}\!{\sf K}}\rule{28.45274pt}{0.0pt}{zvw}\rule{85.35826pt}{0.0pt}{z{:=}{\sf I}}\end{array}}$

$\begin{array}[c]{l|l|l|l|l}{\sf I}&{\sf I}xy=xy&{\sf K}{\sf K}y={\sf K}&{\sf K}zvw=zw&w\\ {\sf K}&{\sf K}xy=x&{\sf K}{\sf K}&{\sf K}{\sf K}zvw={\sf K}vw=v&v\end{array}$ * * Hence $\begin{array}[t]{rcl}{\sf I}({\sf K}{\sf K}){\sf I}P_{1}P_{0}&=&P_{0};\\ {\sf K}({\sf K}{\sf K}){\sf I}P_{1}P_{0}&=&P_{1}.\end{array}$

Example 2. $M_{0}\equiv{\sf I},\,M_{1}\equiv{\omega}$ .

${\scriptsize\begin{array}[]{l}\rule{8.53581pt}{0.0pt}x\rule{39.83385pt}{0.0pt}{x{:=}{\sf K}_{*}}\rule{68.2866pt}{0.0pt}{xyz}\rule{34.1433pt}{0.0pt}{x{:=}{\sf K}_{*},y:=Ku}\end{array}}$

$\begin{array}[l]{l|l|l|l|l}{\sf I}&{\sf I}x=x&{\sf K}_{*}&{\sf K}_{*}xyz=yz&x\\ {\omega}&{\omega}x=xx&{\sf K}_{*}{\sf K}_{*}={\sf K}{\sf I}{\sf K}_{*}={\sf I}&{\sf I}xyz=xyz&z\end{array}$ * * Hence $\begin{array}[t]{rcl}{\sf I}{\sf K}_{*}{\sf K}_{*}({\sf K}P_{0})P_{1}&=&P_{0};\\ {\omega}{\sf K}_{*}{\sf K}_{*}({\sf K}P_{0})P_{1}&=&P_{1}.\end{array}$ * *

Example 3. $M_{0}\equiv\lambda xy.xy{\sf I},\;M_{1}\equiv\lambda xy.xy{\omega}$ . Consider these as trees:

$\textstyle{\lambda xy.x\rule{14.22636pt}{0.0pt}}$$\textstyle{y\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\textstyle{{\sf I}\rule{14.22636pt}{0.0pt}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$ * * $\textstyle{\lambda xy.x\rule{14.22636pt}{0.0pt}}$$\textstyle{y\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\textstyle{{\omega}\rule{8.53581pt}{0.0pt}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$

In order to separate these, we zoom in on the difference ${\sf I}$ and ${\omega}$ , via $M_{0}{\sf K}_{*}y,M_{1}{\sf K}_{*}y,$ giving ${\sf I},\;{\omega}$ respectively, and we know how to separate these by Example 2.

Example 4. $M_{0}\equiv\lambda xy.xy(x{\sf I}y),\;M_{1}\equiv\lambda xy.xy(x{\omega}y)$ . Consider their trees:*

[TABLE]

Again we like to zoom in on the difference ${\sf I}$ and ${\omega}$ . Dilemma: one cannot make the $x$ choose both the left and right branch. Solution: applying the ‘Böhm transformation’ $xy$ , $x{:=}\lambda abz.zab$ postpones the choice and yields trees

**

[TABLE]

after which one can zoom in by application to $z,z{:=}{\sf K}_{*},z,z{:={\sf K}}$ , obtaining ${\sf I}$ and ${\omega}$ ; then we are back to Example 2. Note that the dilemma was solved essentially by replacing $x$ by $\lambda z.z$ , enabling to make postponed choices: first ${\sf K}$ (going right), then ${\sf K}_{*}$ (going left).

It is clear that one needs to require that the terms have different ${\beta\eta}$ -nfs, not just $\beta$ -nfs. The terms $\lambda x.x$ and $\lambda xy.xy$ are different $\beta$ -nfs, but cannot be separated: $F(\lambda x.x)=_{\beta}\lambda xy.x$ and $F(\lambda xy.xy)=_{\beta}\lambda xy.y$ would imply

[TABLE]

from which any equation can be derived, contradicting that the $\lambda{\beta\eta}$ -calculus is consistent.

Corollary 7.

For all $M_{0},M_{1}{\in}\Lambda^{o}$ having a $\beta$ -nf the following are equivalent.

(1)

For all $P_{0},P_{1}{\in}\Lambda^{o}$ there exist $\vec{N}{\in}\Lambda^{o}$ such that

[TABLE] 2. (2)

$M_{0},M_{1}$ * are separable, i.e. for all $P_{0},P_{1}{\in}\Lambda^{o}$ there exists an $F{\in}\Lambda^{o}$ such that*

[TABLE] 3. (3)

There exists an $F{\in}\Lambda^{o}$ such that

[TABLE] 4. (4)

The equation $M_{0}=M_{1}$ is inconsistent with $\lambda\beta$ . 5. (5)

The equation $M_{0}=M_{1}$ is inconsistent with $\lambda{\beta\eta}$ 121212Dropping the requirement that both $M_{0},M_{1}$ have a $\beta$ -nf, the result no longer holds: the equation $\langle{\sf K},{\sf K}_{*}\rangle=\langle\Omega{\sf I},\Omega{{{\mathbf{c}}}_{1}}\rangle$ is consistent with $\lambda\beta$ , but not with $\lambda{\beta\eta}$ , as follows from considerations similar to those in [Bar20, Theorem 3.2.24].. 6. (6)

The terms $M_{0},M_{1}$ have distinct ${\beta\eta}$ -nfs.

Proof 4.2.

[SB05]**

(1) $\Rightarrow$ (2) By (1) there are $\vec{N}$ such that $M_{i}\vec{N}=_{\beta}P_{i}$ . Take $F\mathbin{{\triangleq}}\lambda m.m\vec{N}$ .

(2) $\Rightarrow$ (3) Take $P_{i}\mathbin{{\triangleq}}\lambda x_{0}x_{1}.x_{i}$ , for $0\leq i\leq 1$ .

(3) $\Rightarrow$ (4) From the equation $M_{0}=M_{1}$ one can by (3) derive $\lambda xy.x=\lambda xy.y$ , from which one can derive any equation; all derivations using just $\lambda\beta$ .

(4) $\Rightarrow$ (5) Trivial.

(5) $\Rightarrow$ (6) By the assumption that $M_{0},M_{1}$ have $\beta$ -nfs and [Bar84], Corollary 15.1.5, it follows that $M_{0},M_{1}$ have ${\beta\eta}$ -nfs. If these were equal, then $M_{0}=_{\beta\eta}M_{1}$ and hence $M_{0}=M_{1}$ would be consistent, contradicting (5).

(6) $\Rightarrow$ (1) By Theorem 4.

Separability of finite sets of normal forms

Together with his students Böhm generalized Theorem 4 from two to $k$ terms.

{defi}

A finite set ${\mathcal{A}}\subseteq\Lambda^{o}$ is called separable if for some $k{\in}{\rm Nature}$

[TABLE]

{thmC}

[[BDCPR79]] Let $M_{0},\ldots,M_{k-1}{\in}\Lambda^{o}$ be terms having different ${\beta\eta}$ -nfs. Then $\{M_{0},\ldots,M_{k-1}\}$ is separable. One even has for all terms $P_{0},\ldots,P_{k-1}{\in}\Lambda^{o}$ there exist terms $\vec{N}{\in}\Lambda^{o}$ such that

[TABLE]

Proof 4.3.

For a proof see [BDCPR79] or [Bar84, proof of Corollary 10.4.14.].

Corollary 8.

Let ${\mathcal{A}}\subseteq\Lambda^{o}$ be a finite set of terms all having a $\beta$ -nf. Then

[TABLE]

Separability of finite sets of general terms

A characterization of separability for finite ${\mathcal{A}}\subseteq\Lambda^{o}$ , possibly containing terms without normal form, is due to [CDCRdR78], see also [Bar84], Theorem 10.4.13. To taste a flavor of that theorem we give some of its consequences collected in [SB05].

(1)

The set $\left\{\begin{array}[c]{l}\lambda x.x{{{\mathbf{c}}}_{0}}{\Omega},\\ \lambda x.x{{{\mathbf{c}}}_{1}}{\Omega}\end{array}\right\}$ is separable; so is $\left\{\begin{array}[c]{l}\lambda xy.xx{\Omega},\\ \lambda xy.xy{\Omega}\end{array}\right\}$ . 2. (2)

$\left\{\begin{array}[c]{l}\lambda x.x(\lambda y.y{\Omega}),\\ \lambda x.x(\lambda y.y{{{\mathbf{c}}}_{0}})\end{array}\right\}$ is not separable; neither is $\left\{\begin{array}[c]{l}\lambda x.x,\\ \lambda xy.xy\end{array}\right\}$ . 3. (3)

$\left\{\begin{array}[c]{l}\lambda x.x(\lambda y.y{{{\mathbf{c}}}_{0}}{\Omega}(\lambda z.z{\Omega})),\\ \lambda x.x(\lambda y.y{{{\mathbf{c}}}_{1}}{\Omega}(\lambda z.z{{{\mathbf{c}}}_{1}})),\\ \lambda x.x(\lambda y.y{{{\mathbf{c}}}_{1}}{\Omega}(\lambda z.z{{{\mathbf{c}}}_{2}}))\end{array}\right\}$ is separable. 4. (4)

$\left\{\begin{array}[c]{l}\lambda x.x{{{\mathbf{c}}}_{0}}{{{\mathbf{c}}}_{0}}{\Omega},\\ \lambda x.x{{{\mathbf{c}}}_{1}}{\Omega}{{{\mathbf{c}}}_{1}},\\ \lambda x.x{\Omega}{{{\mathbf{c}}}_{2}}{{{\mathbf{c}}}_{2}}\end{array}\right\}$ is not separable, although each proper subset is.

Separability of infinite sets of general terms

In [SB05] for infinite sets separability is defined and characterized. Here we give a slightly alternative formulation.

{nota}

Let ${\mathcal{A}}\subseteq\Lambda^{o}$ . Write for $F{\in}\Lambda^{o}$

[TABLE]

{defi}

Let ${\mathcal{A}}\subseteq\Lambda^{o}$ be an infinite set. Then

(1)

${\mathcal{A}}$ is called special if there are combinators $F,G{\in}\Lambda^{o}$ such that modulo $=_{\beta}$ one has

[TABLE] 2. (2)

${\mathcal{A}}$ is called separable if ${\mathcal{A}}=_{1}{\mathcal{C}}_{\rm Nature}$ , that is, there is a lambda definable bijection $F\colon{\mathcal{A}}\mathrel{\rightarrow}{\mathcal{C}}_{\rm Nature}$ with lambda definable inverse.

Remark 9.

If ${\mathcal{A}}$ only has a $\lambda$ -definable $F\colon{\mathcal{A}}\mathrel{\rightarrow}{\mathcal{C}}_{\rm Nature}$ injection, then ${\mathcal{A}}$ doesn’t need to be special. Indeed, let $K\subseteq{\rm Nature}$ be re but not recursive, so that its complement $\overline{K}\subseteq{\rm Nature}$ is not re. Define ${\mathcal{A}}=\{{{{\mathbf{c}}}_{n}}\mid n{\in}\overline{K}\}$ . Then ${\sf I}\colon{\mathcal{A}}\mathrel{\rightarrow}{\mathcal{C}}_{\rm Nature}$ is an injection. For this ${\mathcal{A}}$ there is no $\lambda$ -definable surjection $G\colon{\mathcal{C}}_{\rm Nature}\mathrel{\rightarrow}{\mathcal{A}}$ , for otherwise

[TABLE]

contradicting that $\overline{K}$ is not re.

{defi}

${\mathcal{A}}$ is called an adequate numeral system if there are terms $\underline{0},\underline{S},\underline{P},\underline{Z}_{?}$ (zero, successor, predecessor, test for zero) such that, writing $\underline{n}\mathbin{{\triangleq}}\underline{S}^{n}\underline{0}$ for $n{\in}{\rm Nature}$ , one has

[TABLE]

Proposition 10.

*Let ${\mathcal{A}}\subseteq\Lambda^{o}$ be infinite. If ${\mathcal{A}}$ is special, then there is a lambda definable bijection $H\colon{\mathcal{A}}\mathrel{\rightarrow}{\mathcal{C}}_{\rm Nature}$ . *

Proof 4.4.

Let combinators $F,\,G$ be given as required in Definition 4. Define by primitive recursion

[TABLE]

In () ‘ $\mu m$ ’ stands for ‘the least number such that’, which in this case always exists since ${\mathcal{A}}$ is infinite and $G$ surjective. That $H$ is $\lambda$ -definable follows from the existence of $F$ : indeed, for $M,N{\in}{\mathcal{A}}$ one has*

[TABLE]

where $Q_{=}$ is the decidable equality predicate on Church numerals, so that also

[TABLE]

is decidable.

Claim. For all $n{\in}{\rm Nature}$ one has

[TABLE]

The claim follows by induction on $n$ . Case $n=0$ . By definition $H{{{\mathbf{c}}}_{0}}=G{{{\mathbf{c}}}_{0}}$ .

Case $n+1$ . Assume $\{G{{{\mathbf{c}}}_{0}},\ldots,G{{{\mathbf{c}}}_{n}}\}\subseteq\{H{{{\mathbf{c}}}_{0}},\ldots,H{{{\mathbf{c}}}_{n}}\}$ (induction hypothesis), towards $\{G{{{\mathbf{c}}}_{0}},\ldots,G{{{\mathbf{c}}}_{n+1}}\}\subseteq\{H{{{\mathbf{c}}}_{0}},\ldots,H{{{\mathbf{c}}}_{n+1}}\}$ . If $G{{{\mathbf{c}}}_{n+1}}{\in}\{H{{{\mathbf{c}}}_{0}},\ldots,H{{{\mathbf{c}}}_{n}}\}$ , then we are done. Otherwise $G{{{\mathbf{c}}}_{n+1}}{\notin}\{H{{{\mathbf{c}}}_{0}},\ldots,H{{{\mathbf{c}}}_{n}}\}$ . For $m{<}(n{+}1)$ one has $G{{{\mathbf{c}}}_{m}}{\in}\{G{{{\mathbf{c}}}_{0}},\ldots,G{{{\mathbf{c}}}_{n}}\}$ which is a subset of $\{H{{{\mathbf{c}}}_{0}},\ldots,H{{{\mathbf{c}}}_{n}}\}$ by the induction hypothesis. Therefore by definition $H{{{\mathbf{c}}}_{n+1}}=G{{{\mathbf{c}}}_{n+1}}$ , and the conclusion holds again. This proves the claim.

By clause () in the definition above $H$ is injective. That it is also surjective follows from the claim and the surjectivity of $G$ .*

Corollary 11 ([SB05]).

*Let ${\mathcal{A}}\subseteq\Lambda^{o}$ be infinite. Then the following are equivalent.

(1)

${\mathcal{A}}$ * is special.* 2. (2)

${\mathcal{A}}$ * is separable. * 3. (3)

${\mathcal{A}}$ * is an adequate numeral system.*

Proof 4.5.

(1) $\Rightarrow$ (2). If ${\mathcal{A}}$ is separable, via $F\colon{\mathcal{A}}\mathrel{\rightarrow}{\mathcal{C}}_{\rm Nature}$ and $G\colon{\mathcal{C}}_{\rm Nature}\mathrel{\rightarrow}{\mathcal{A}}$ , then by Proposition 10 there exists a $\lambda$ -definable $H\colon{\mathcal{C}}_{\rm Nature}\mathrel{\rightarrow}{\mathcal{A}}$ that is a bijection. We need to show that $H$ has a $\lambda$ -definable inverse. This $H^{-1}\colon{\mathcal{A}}\mathrel{\rightarrow}{\mathcal{C}}_{\rm Nature}$ can be defined by

[TABLE]

(2) $\Rightarrow$ (3). By $H,H^{-1}$ the set ${\mathcal{A}}$ inherits the structure of an adequate numeral system from ${\mathcal{C}}_{\rm Nature}$ .

(3) $\Rightarrow$ (1). Let $\underline{0},\underline{S},\underline{P},\underline{Z}_{?}$ give ${\mathcal{A}}$ the structure of an adequate numeral system. Then the computable functions can be $\lambda$ -defined w.r.t. the $\underline{n}$ . By primitive recursion on the $\underline{n}$ and ${{{\mathbf{c}}}_{n}}$ numerals, respectively, one can define $\lambda$ -definable $F\colon{\mathcal{A}}\mathrel{\rightarrow}{\mathcal{C}}_{\rm Nature}$ and $G\colon{\mathcal{C}}_{\rm Nature}\mathrel{\rightarrow}{\mathcal{A}}$ satisfying $F\underline{n}={{{\mathbf{c}}}_{n}}$ and $G{{{\mathbf{c}}}_{n}}=\underline{n}$ , making ${\mathcal{A}}$ separable.

5. Translating without parsing

Combinatory terms, built-up from ${\bf K},{\bf S}$ with just application, with reduction rules

[TABLE]

suffice to represent arbitrary computations. We write all parenthesis. For example ${\bf((S(KK))S)}$ is such a term. It was noticed by Böhm and Dezani that the meaning of such a term can be found by interpreting it symbol by symbol, including the two parentheses. One doesn’t need to parse the combinator to display its tree-like structure. The method also applies to combinatory terms build from different combinators, including for example ${\bf B}$ corresponding to the $\lambda$ -term ${\sf B}=\lambda fgx.f(gx)=\lambda fg.f\circ g$ .

{defi}

Define for $\lambda$ -terms $M,N$

[TABLE]

It is easy to see that $\circ$ and $\mathbin{\tiny\ast}$ are associative modulo $\beta$ -equality of the $\lambda$ -calculus; moreover, for $k\geq 2$ one has

[TABLE]

{defi}

Combinatory terms ${\mathcal{C}}$ are built up over alphabet $\Sigma=\{{\bf{K}},{\bf S},(,)\}$ by the following context-free grammar

[TABLE]

{defi}

Given $P{\in}{\mathcal{C}}$ its translation into closed terms of the $\lambda$ -calculus is $P_{\lambda}$ defined recursively as follows:

[TABLE]

For this translation the $P{\in}{\mathcal{C}}$ needs to be parsed. For example if $P=(QR)$ , we need to know where the string $Q$ ends and similarly where $R$ starts. The following translation avoids this need for parsing.

{defi}

(1)

The symbols of $\Sigma$ are translated into $\Lambda^{o}$ as follows.

[TABLE] 2. (2)

A word in $w=a_{1}\cdots a_{n}{\in}\Sigma^{*}$ is translated into $\phi(w){\in}\Lambda^{o}$ as follows.

[TABLE]

{propC}

[[BD73]]

(1)

For all $P{\in}{\mathcal{C}}$ one has $\phi(P)=_{\beta}\langle P_{\lambda}\rangle$ . 2. (2)

For all $P{\in}{\mathcal{C}}$ one has $\phi(P){\sf I}=_{\beta}P_{\lambda}$ .

Proof 5.1.

(1)

Since $P{\in}{\mathcal{C}}$ , we may use induction over terms in ${\mathcal{C}}$ . If $P={\bf{K}}$ or $P={\bf S}$ , the result holds by definition of $\phi$ . If $P=(QR)$ , then

[TABLE] 2. (2)

*By (1): * $\phi(P){\sf I}=_{\beta}\langle P_{\lambda}\rangle{\sf I}=_{\beta}{\sf I}P_{\lambda}=_{\beta}P_{\lambda}.$

Proposition 5(2) shows that the meaning of $P$ can be obtained without parsing.

6. A simple self-evaluator

To $M{\in}\Lambda$ one assigns computably a Gödel-number $\#M$ .

{defi}

For $M{\in}\Lambda$ its code $\mathopen{\mbox{\hskip 1.42271pt\rule[3.44444pt]{0.04303pt}{4.30554pt} \kern-3.50006pt{\rule[7.74998pt]{2.58334pt}{0.04303pt}}}}{M}\mathclose{\mbox{\rule[7.74998pt]{2.58334pt}{0.04303pt} \kern-3.50006pt{\rule[3.44444pt]{0.04303pt}{4.30554pt}\hskip 1.42271pt}}}$ is defined as the Church numeral corresponding to $\#M$

[TABLE]

Note that the code of $M$ satisfies 1. $\mathopen{\mbox{\hskip 1.42271pt\rule[3.44444pt]{0.04303pt}{4.30554pt} \kern-3.50006pt{\rule[7.74998pt]{2.58334pt}{0.04303pt}}}}{M}\mathclose{\mbox{\rule[7.74998pt]{2.58334pt}{0.04303pt} \kern-3.50006pt{\rule[3.44444pt]{0.04303pt}{4.30554pt}\hskip 1.42271pt}}}$ is in normal form; 2. syntactic operations on $M$ are lambda definable on $\mathopen{\mbox{\hskip 1.42271pt\rule[3.44444pt]{0.04303pt}{4.30554pt} \kern-3.50006pt{\rule[7.74998pt]{2.58334pt}{0.04303pt}}}}{M}\mathclose{\mbox{\rule[7.74998pt]{2.58334pt}{0.04303pt} \kern-3.50006pt{\rule[3.44444pt]{0.04303pt}{4.30554pt}\hskip 1.42271pt}}}$ , by the computability of $\#$ . An evaluator ${\sf E}$ is constructed by Stephen Cole Kleene in [Kle35] such that for all $M{\in}\Lambda^{o}$ one has

[TABLE]

A technical problem to define ${\sf E}$ and show this is caused by the fact that the lambda terms are inductively defined via open terms containing free variables. But the decoding only holds for closed terms. The way Kleene dealt with this (basically the problem of representing the binding effect of $\lambda x$ ), was to translate closed $\lambda$ -terms first to combinators and then representing these as numerals. The term ${\sf E}$ was reconstructed by McCarthy for the programming language LISP under the name ‘eval’, and baptized in [Rey72] as the ‘meta-circular’ self-interpreter.

During lectures at Radboud University on Kleene’s self-evaluator ${\sf E}$ and constructing this term via the combinators, the student Peter de Bruin came with an improvement. He suggested to use the intuition of denotational semantics of $\lambda$ -calculus. First the meaning of an open term $M$ (containing possibly free variables) is given, in notation ${\sf E}_{0}\mathopen{\mbox{\hskip 1.42271pt\rule[3.44444pt]{0.04303pt}{4.30554pt} \kern-3.50006pt{\rule[7.74998pt]{2.58334pt}{0.04303pt}}}}{M}\mathclose{\mbox{\rule[7.74998pt]{2.58334pt}{0.04303pt} \kern-3.50006pt{\rule[3.44444pt]{0.04303pt}{4.30554pt}\hskip 1.42271pt}}}v$ , using a valuation $v$ assigning values $v(\mathopen{\mbox{\hskip 1.42271pt\rule[3.44444pt]{0.04303pt}{4.30554pt} \kern-3.50006pt{\rule[7.74998pt]{2.58334pt}{0.04303pt}}}}{x}\mathclose{\mbox{\rule[7.74998pt]{2.58334pt}{0.04303pt} \kern-3.50006pt{\rule[3.44444pt]{0.04303pt}{4.30554pt}\hskip 1.42271pt}}}){\in}\Lambda$ to the code of a free variable $x$ .

{thmC}

[[Kle35]] There is a term ${\sf E}{\in}\Lambda^{o}$ such that

[TABLE]

Proof 6.1.

(P. de Bruin) By the effectiveness of the Gödel-numbering there exists an ${\sf E}_{0}{\in}\Lambda^{o}$ satisfying

[TABLE]

where $v[\mathopen{\mbox{\hskip 1.42271pt\rule[3.44444pt]{0.04303pt}{4.30554pt} \kern-3.50006pt{\rule[7.74998pt]{2.58334pt}{0.04303pt}}}}{x}\mathclose{\mbox{\rule[7.74998pt]{2.58334pt}{0.04303pt} \kern-3.50006pt{\rule[3.44444pt]{0.04303pt}{4.30554pt}\hskip 1.42271pt}}}\mapsto y]=v^{\prime}$ with

[TABLE]

Then one can prove that for $M{\in}\Lambda$ with $\mathrm{FV}(M)\subseteq\{x_{1},\ldots,x_{n}\}$ one has

[TABLE]

Therefore

[TABLE]

and one can take ${\sf E}\mathbin{{\triangleq}}\lambda m.{\sf E}_{0}m{\sf I}$ .

Corollary 12.

The term ${\sf E}$ enumerates the closed $\lambda$ -terms

[TABLE]

Remark 13.

In [Bar95] it is proved (constructively) that any enumerator of the closed terms is reducing in the following sense.

[TABLE]

The construction of Peter de Bruin inspired [Mog94] to a higher order encoding of $\lambda$ -terms, see [PE88], in which a $\lambda$ is interpreted by itself.

{defiC}

[[Mog94]] An open lambda term $M$ can be interpreted as an open lambda term with the same free variables as follows.

[TABLE]

This can be seen as first using three unspecified constructors ${\tt var,app,abs}{\in}\Lambda^{o}$ as follows

[TABLE]

and then taking

[TABLE]

{thmC}

[[Mog94]] There is an evaluator ${\sf E}^{m}{\in}\Lambda^{o}$ such that for all $M{\in}\Lambda$

[TABLE]

Proof 6.2.

Using Turing’s fixed point combinator $\Theta$ one can construct a term ${\sf E}^{m}{\in}\Lambda^{o}$ such that

[TABLE]

where $B\mathbin{{\triangleq}}\lambda epq.ep(eq)$ , and $C\mathbin{{\triangleq}}\lambda ezx.e(zx)$ : take ${\sf E}^{m}\mathbin{{\triangleq}}\Theta(\lambda em.m{\sf I}(Be)(Ce))$ . Then by induction on the structure of $M{\in}\Lambda$ it follows that ${\sf E}^{m}\mathopen{\mbox{\hskip 1.42271pt\rule[3.44444pt]{0.04303pt}{4.30554pt} \kern-3.50006pt{\rule[7.74998pt]{2.58334pt}{0.04303pt}}}}{M}\mathclose{\mbox{\rule[7.74998pt]{2.58334pt}{0.04303pt} \kern-3.50006pt{\rule[3.44444pt]{0.04303pt}{4.30554pt}\hskip 1.42271pt}}}^{m}\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}M$ .

[TABLE]

Remark 14.

(1)

Using Mogensen’s translation, decoding is possible for all terms $M{\in}\Lambda$ possibly containing free variables. On the other hand not all syntactic operations are possible on the coded terms. Equality test for variables is possible for $\mathopen{\mbox{\hskip 1.42271pt\rule[3.44444pt]{0.04303pt}{4.30554pt} \kern-3.50006pt{\rule[7.74998pt]{2.58334pt}{0.04303pt}}}}{x}\mathclose{\mbox{\rule[7.74998pt]{2.58334pt}{0.04303pt} \kern-3.50006pt{\rule[3.44444pt]{0.04303pt}{4.30554pt}\hskip 1.42271pt}}}$ , but not for $\mathopen{\mbox{\hskip 1.42271pt\rule[3.44444pt]{0.04303pt}{4.30554pt} \kern-3.50006pt{\rule[7.74998pt]{2.58334pt}{0.04303pt}}}}{x}\mathclose{\mbox{\rule[7.74998pt]{2.58334pt}{0.04303pt} \kern-3.50006pt{\rule[3.44444pt]{0.04303pt}{4.30554pt}\hskip 1.42271pt}}}^{m}$ . 2. (2)

In spite of this, the lambda definability of equality discrimination for coded closed terms $\mathopen{\mbox{\hskip 1.42271pt\rule[3.44444pt]{0.04303pt}{4.30554pt} \kern-3.50006pt{\rule[7.74998pt]{2.58334pt}{0.04303pt}}}}{M}\mathclose{\mbox{\rule[7.74998pt]{2.58334pt}{0.04303pt} \kern-3.50006pt{\rule[3.44444pt]{0.04303pt}{4.30554pt}\hskip 1.42271pt}}}^{m},\mathopen{\mbox{\hskip 1.42271pt\rule[3.44444pt]{0.04303pt}{4.30554pt} \kern-3.50006pt{\rule[7.74998pt]{2.58334pt}{0.04303pt}}}}{N}\mathclose{\mbox{\rule[7.74998pt]{2.58334pt}{0.04303pt} \kern-3.50006pt{\rule[3.44444pt]{0.04303pt}{4.30554pt}\hskip 1.42271pt}}}^{m}{\in}\Lambda^{o}$ is proved in **[Bar01]**. 3. (3)

In **[Mog94]** it is also proved that there is a normalizer acting on coded terms.

*There is a term ${\sf R}^{m}$ such that for all $M{\in}\Lambda$

if $M$ has a normal form $N$ , then ${\sf R}^{m}\mathopen{\mbox{\hskip 1.42271pt\rule[3.44444pt]{0.04303pt}{4.30554pt} \kern-3.50006pt{\rule[7.74998pt]{2.58334pt}{0.04303pt}}}}{M}\mathclose{\mbox{\rule[7.74998pt]{2.58334pt}{0.04303pt} \kern-3.50006pt{\rule[3.44444pt]{0.04303pt}{4.30554pt}\hskip 1.42271pt}}}^{m}\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}\mathopen{\mbox{\hskip 1.42271pt\rule[3.44444pt]{0.04303pt}{4.30554pt} \kern-3.50006pt{\rule[7.74998pt]{2.58334pt}{0.04303pt}}}}{N}\mathclose{\mbox{\rule[7.74998pt]{2.58334pt}{0.04303pt} \kern-3.50006pt{\rule[3.44444pt]{0.04303pt}{4.30554pt}\hskip 1.42271pt}}}^{m}$ ;

if $M$ has a no normal form, then ${\sf R}^{m}\mathopen{\mbox{\hskip 1.42271pt\rule[3.44444pt]{0.04303pt}{4.30554pt} \kern-3.50006pt{\rule[7.74998pt]{2.58334pt}{0.04303pt}}}}{M}\mathclose{\mbox{\rule[7.74998pt]{2.58334pt}{0.04303pt} \kern-3.50006pt{\rule[3.44444pt]{0.04303pt}{4.30554pt}\hskip 1.42271pt}}}^{m}$ has no nf.

Berarducci and Böhm constructed a very simple self-evaluator, based on Mogensen’s construction above, but using different choices for var, app, abs. These are based on unpublished work of Böhm and Piperno, who represented algebraic data structures in such a way that primitive recursive (computable) functions are representable by terms in normal form, avoiding the fixed point operator that was used in the proof of Theorem 13.

{thmC}

[[BB93]] There is a coding of $\lambda$ -terms $M\mapsto\mathopen{\mbox{\hskip 1.42271pt\rule[3.44444pt]{0.04303pt}{4.30554pt} \kern-3.50006pt{\rule[7.74998pt]{2.58334pt}{0.04303pt}}}}{M}\mathclose{\mbox{\rule[7.74998pt]{2.58334pt}{0.04303pt} \kern-3.50006pt{\rule[3.44444pt]{0.04303pt}{4.30554pt}\hskip 1.42271pt}}}^{bb}$ with a short closed normal form ${\sf E}^{bb}\mathbin{{\triangleq}}\langle\langle{\sf K},{\sf S},{\sf C}\rangle\rangle$ as evaluator.

Proof 6.3.

Define

[TABLE]

where

[TABLE]

By induction on the structure of $M$ we show that $\mathopen{\mbox{\hskip 1.42271pt\rule[3.44444pt]{0.04303pt}{4.30554pt} \kern-3.50006pt{\rule[7.74998pt]{2.58334pt}{0.04303pt}}}}{M}\mathclose{\mbox{\rule[7.74998pt]{2.58334pt}{0.04303pt} \kern-3.50006pt{\rule[3.44444pt]{0.04303pt}{4.30554pt}\hskip 1.42271pt}}}^{bb}\langle{\sf K},{\sf S},{\sf C}\rangle\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}M$ .

Case $M\equiv x$ . Then

[TABLE]

Case $M\equiv PQ$ . Then

[TABLE]

Case $M\equiv\lambda x.P$ . Then

[TABLE]

Therefore for all $M{\in}\Lambda$ one has $\mathopen{\mbox{\hskip 1.42271pt\rule[3.44444pt]{0.04303pt}{4.30554pt} \kern-3.50006pt{\rule[7.74998pt]{2.58334pt}{0.04303pt}}}}{M}\mathclose{\mbox{\rule[7.74998pt]{2.58334pt}{0.04303pt} \kern-3.50006pt{\rule[3.44444pt]{0.04303pt}{4.30554pt}\hskip 1.42271pt}}}^{bb}\langle{\sf K},{\sf S},{\sf C}\rangle\mathrel{\rightarrow\mathrel{\mkern-14.0mu}\rightarrow}M$ . It follows that ${\sf E}^{bb}\mathbin{{\triangleq}}\langle\langle{\sf K},{\sf S},{\sf C}\rangle\rangle$ is a self-evaluator: for all $M{\in}\Lambda$

[TABLE]

It is a remarkable coincidence that the term ${\sf E}^{bb}\equiv\langle\langle{\sf K},{\sf S},{\sf C}\rangle\rangle$ abbreviates the name “Kleene, Stephen Cole”, the full name of the inventor of self-evaluation in $\lambda$ -calculus. Corrado Böhm was fond of such tricks and had for this and other reasons the nickname ‘il miracolo’.

Coda

At a symposium in honor of Corrado Böhm’s ninety’s birthday, January 2013, at Sapienza University, Rome, the jubilee treated the audience with an open problem. Actually it is more a ‘Koan’ (not precisely stated) than a Problem (with a precisely stated space of answers). But Koans are often the more interesting problems in mathematics and computer science.

Problem/Koan 15 ((C. Böhm, 2013)).

Given $\beta$ -normal forms $F\equiv\lambda x_{1}\cdots x_{n}.P$ , and $G\equiv\lambda x_{1}\cdots x_{n}.Q{\in}\Lambda^{o}$ . By writing $F^{d}\mathbin{{\triangleq}}\lambda x.F(x{{{\mathbf{c}}}_{1}})\ldots(x{{{\mathbf{c}}}_{n}})$ and similarly for $G^{d}$ , these terms can be made unary. Trying to find closed terms ${M}$ from solutions $N$ of the equation $F^{d}N=_{\beta}G^{d}N$ ? (Define a deed to be a closed nf of the form $\lambda x.xP_{1}\cdots P_{k}$ . The $F^{d},G^{d}$ are deeds up to $=_{\beta}$ .)

Acknowledgments

The author thanks Marko van Eekelen for explaining him many years ago the method of bootstrapping (Section 1), Mariangiola Dezani for comments on the paper, and Rinus Plasmeijer for discussions about Section 3. The referees provided very useful remarks, improving the paper.

To the family of Corrado Böhm I am grateful for letting me spend wonderful times with them, besides for fully enabling us to enjoy the combinators.

Bibliography54

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[AM 72] E. Ashcroft and Z. Manna. The translation of goto programs into while programs. In C.V. Freiman, J.E. Griffith, and J.L. Rosenfeld, editors, Proceedings of IFIP Congress 71 , volume 1, pages 250–255, Amsterdam, 1972. North-Holland.
2[Bar 75] H. P. Barendregt. Normed uniformly reflexive structures. In Proceedings of the Symposium on Lambda-Calculus and Computer Science Theory , pages 272–286, Berlin, Heidelberg, 1975. Springer-Verlag.
3[Bar 84] H. P. Barendregt. The Lambda Calculus: its Syntax and Semantics . North-Holland, revised edition, 1984.
4[Bar 95] H. P. Barendregt. Enumerators of lambda terms are reducing constructively. Annals of Pure and Applied Logic , 73:3–9, 1995.
5[Bar 96] H. P. Barendregt. Kreisel, lambda calculus, a windmill and a castle , pages 3–14. Peters, Wellesley, Mass., 1996.
6[Bar 01] H. P. Barendregt. Discriminating coded lambda terms. In A. Anderson and M. Zeleny, editors, Logic, Meaning and Computation: Essays in Memory of Alonzo Church , pages 275–285. Kluwer, 2001.
7[Bar 20] H. P. Barendregt. Some extensional term models for λ 𝜆 \lambda -calculi and combinatory logics . Ph D thesis, Utrecht University, 1971/2020. Kindle Desktop Publishing. Extended re-edition 2020.
8[BB 93] A. Berarducci and C. Böhm. A self-interpreter of lambda calculus having a normal form. In E. Börger, G. Jäger, H. Kleine Büning, S. Martini, and M. M. Richter, editors, Computer Science Logic , pages 85–99, Berlin, Heidelberg, 1993. Springer Berlin Heidelberg.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Gems of Corrado Böhm

Abstract.

Key words and phrases:

Introduction

1. Self-compilation

1.1. Algorithms, computers, and imperative programming

Proposition 1**.**

Proof 1.1**.**

1.2. Programming languages and compilers

Proposition 2**.**

Proof 1.2**.**

1.3. Compilers written in machine language MMM

Proposition 3**.**

Proof 1.3**.**

Proposition 4**.**

Proof 1.4** (Sketch).**

1.4. Compilers written in higher programming languages

1.5. Compiler configurations

Exercise 1.5**.**

2. Structured programming

2.1. Imperative programming

2.2. Eliminating the ‘goto’

Theorem 5** (Kleene Normal Form Theorem).**

Proof 2.1** (Sketch).**

Proof 2.2** (Sketch).**

Corollary 6** (Folk Theorem).**

Proof 2.3**.**

2.3. Evaluation

3. Functional programming and the CUCH machine

3.1. Functional programming

3.2. Comparing imperative and functional programming

Advantages of functional programming

Implementations of functional programming

Challenges for functional programming

4. Separability in λ\lambdaλ-calculus

Separability of two normal forms

Proof 4.1** (Sketch).**

Corollary 7**.**

Proof 4.2**.**

Separability of finite sets of normal forms

Proof 4.3**.**

Corollary 8**.**

Separability of finite sets of general terms

Separability of infinite sets of general terms

Remark 9**.**

Proposition 10**.**

Proof 4.4**.**

Corollary 11** ([SB05]).**

Proof 4.5**.**

5. Translating without parsing

Proof 5.1**.**

6. A simple self-evaluator

Proof 6.1**.**

Corollary 12**.**

Remark 13**.**

Proof 6.2**.**

Remark 14**.**

Proof 6.3**.**

Coda

Problem/Koan 15** ((C. Böhm, 2013)).**

Acknowledgments

Proposition 1.

Proof 1.1.

Proposition 2.

Proof 1.2.

1.3. Compilers written in machine language $M$

Proposition 3.

Proof 1.3.

Proposition 4.

Proof 1.4 (Sketch).

Exercise 1.5.

Theorem 5 (Kleene Normal Form Theorem).

Proof 2.1 (Sketch).

Proof 2.2 (Sketch).

Corollary 6 (Folk Theorem).

Proof 2.3.

4. Separability in $\lambda$ -calculus

Proof 4.1 (Sketch).

Corollary 7.

Proof 4.2.

Proof 4.3.

Corollary 8.

Remark 9.

Proposition 10.

Proof 4.4.

Corollary 11 ([SB05]).

Proof 4.5.

Proof 5.1.

Proof 6.1.

Corollary 12.

Remark 13.

Proof 6.2.

Remark 14.

Proof 6.3.

Problem/Koan 15 ((C. Böhm, 2013)).