LiFtEr: Language to Encode Induction Heuristics for Isabelle/HOL

Yutaka Nagashima

arXiv:1906.08084·cs.LO·May 26, 2020

LiFtEr: Language to Encode Induction Heuristics for Isabelle/HOL

Yutaka Nagashima

PDF

1 Repo

TL;DR

LiFtEr is a domain-specific language that encodes induction heuristics for Isabelle/HOL, enabling automated checking and transfer of expert knowledge to broader users in theorem proving.

Contribution

The paper introduces LiFtEr, a novel language that formalizes and automates the application of induction heuristics in Isabelle/HOL.

Findings

01

LiFtEr successfully encodes various induction heuristics.

02

Automated checking of heuristics improves proof automation.

03

LiFtEr enhances knowledge transfer among Isabelle users.

Abstract

Proof assistants, such as Isabelle/HOL, offer tools to facilitate inductive theorem proving. Isabelle experts know how to use these tools effectively; however, there is a little tool support for transferring this expert knowledge to a wider user audience. To address this problem, we present our domain-specific language, LiFtEr. LiFtEr allows experienced Isabelle users to encode their induction heuristics in a style independent of any problem domain. LiFtEr's interpreter mechanically checks if a given application of induction tool matches the heuristics, thus automating the knowledge transfer loop.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

data61/PSL
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

11institutetext: Czech Technical University in Prague, Prague, Czech Republic 11email: [email protected] 22institutetext: University of Innsbruck, Innsbruck, Austria

LiFtEr: Language to Encode Induction Heuristics

for Isabelle/HOL ††thanks: We thank Ekaterina Komendantskaya, Josef Urban, Kenji Miyamoto, and anonymous reviewers for APLAS2019 for their valuable comments on an early draft of this paper. This work was supported by the European Regional Development Fund under the project AI & Reasoning (reg. no.CZ.02.1.01/0.0/0.0/15_003/0000466).

Yutaka Nagashima 1122 0000-0001-6693-5325

Abstract

Proof assistants, such as Isabelle/HOL, offer tools to facilitate inductive theorem proving. Isabelle experts know how to use these tools effectively; however, there is a little tool support for transferring this expert knowledge to a wider user audience. To address this problem, we present our domain-specific language, LiFtEr. LiFtEr allows experienced Isabelle users to encode their induction heuristics in a style independent of any problem domain. LiFtEr’s interpreter mechanically checks if a given application of induction tool matches the heuristics, thus automating the knowledge transfer loop.

Keywords:

Induction Isabelle/HOL Domain-Specific Language.

1 Introduction

Consider the following reverse functions, rev and itrev, presented in a tutorial of Isabelle/HOL [26]:

primrec rev::"’a list =>’a list" where "rev [] = []" | "rev (x # xs) = rev xs @ [x]"

fun itrev::"’a list =>’a list =>’a list" where "itrev [] ys = ys" | "itrev (x#xs) ys = itrev xs (x#ys)"

where # is the list constructor, and @ appends two lists. How do you prove the following lemma?

lemma "itrev xs ys = rev xs @ ys"

Since both rev and itrev are defined recursively, it is natural to imagine that we can handle this problem by applying induction. But how do you apply induction and why? What induction heuristics do you use? In which language do you describe those heuristics?

Modern proof assistants (PAs), such as Isabelle/HOL [26], are forming the basis of trustworthy software. Klein et al., for example, verified the correctness of the seL4 micro-kernel in Isabelle/HOL [11], whereas Leroy developed a certifying C compiler, CompCert, using Coq [15]. Despite the growing number of such complete formal verification projects, the limited progress in proof automation still keeps the cost of proof development high, thus preventing the wide spread adoption of complete formal verification.

A noteworthy approach in proof automation for proof assistants is hammer tools [1]. Sledgehammer, for example, exports proof goals in Isabelle/HOL to various external automated theorem provers (ATPs) to exploit the state-of-the-art proof automation of these backend provers; however, the discrepancies between the polymorphic higher-order logic of Isabelle/HOL and the monomorphic first-order logic of the backend provers severely impair Sledgehammer’s performance when it comes to inductive theorem proving (ITP).

This is unfortunate for two reasons. Firstly, many Isabelle users chose Isabelle/HOL precisely because its higher-order logic is expressive enough to specify mathematical objects and procedures involving recursion without introducing new axioms. Secondly, induction lies at the heart of mathematics and computer science. For instance, induction is often necessary for reasoning about natural numbers, recursive data-structures, such as lists and trees, computer programs containing recursion and iteration [3].

This is why ITP remains as a long-standing challenge in computer science, and its automation is much needed. Facing the limited automation in ITP, Gramlich surveyed the problems in ITP and presented the following prediction in 2005 [6]:

in the near future, ITP will only be successful for very specialized domains for very restricted classes of conjectures. ITP will continue to be a very challenging engineering process.

We address this conundrum with our domain-specific language, LiFtEr. LiFtEr allows experienced Isabelle users to encode their induction heuristics in a style independent of problem domains. LiFtEr’s interpreter mechanically checks if a given application of induction is compatible with the induction heuristics written by experienced users. Our research hypothesis is that:

it is possible to encode valuable induction heuristics for Isabelle/HOL in LiFtEr and these heuristics can be valid across diverse problem domains, because LiFtEr allows for meta-reasoning on applications of induction methods, without relying on concrete proof goals, their underlying proof states, nor concrete applications of induction methods.*

We developed LiFtEr as an Isabelle theory and integrated LiFtEr into Isabelle’s proof language, Isabelle/Isar, and its proof editor, Isabelle/jEdit. This allows for an easy installation process: to use LiFtEr, users only have to import the relevant theory files into their theory files, using Isabelle’s import keyword. Our working prototype is available at GitHub [20].

The important difference of LiFtEr from other tactic languages, such as Eisbach [16] and Ltac [4], is that LiFtEr itself is not a tactic language but a language to write how one should use Isabelle’s existing proof method for induction. To the best of our knowledge, LiFtEr is the first language in which one can write how to use a tactic by mechanically analyzing the structures of proof goals in a style independent of any problem domain.

2 Induction in Isabelle/HOL

To handle inductive problems, modern proof assistants offer tools to apply induction. For example, Isabelle comes with the induct proof method and the induction method 111Proof methods are the Isar syntactic layer of LCF-style tactics.. Nipkow et al. proved our ongoing example as follows [25]:

lemma model_prf:"itrev xs ys = rev xs @ ys" apply(induct xs arbitrary: ys) by auto

Namely, they applied structural induction on xs while generalizing ys before applying induction by passing the string ys to the arbitrary field. The resulting sub-goals are:

!!ys. itrev [] ys = rev [] @ ys
!!a xs ys. (!!ys. itrev xs ys = rev xs @ ys) ==> itrev (a # xs) ys = rev (a # xs) @ ys

where !! is the universal quantifier and ==> is the implication in Isabelle’s meta-logic. Due to the generalization, the ys in the induction hypothesis is quantified within the hypothesis, and it is differentiated from the ys that appears in the conclusion. Had Nipkow et al. omitted arbitrary: ys, the first sub-goal would be the same, but the second sub-goal would have been:

!!a xs. itrev xs ys = rev xs @ ys ==> itrev (a # xs) ys = rev (a # xs) @ ys

Since the same ys is shared by the induction hypothesis and the conclusion, the subsequent application of auto fails to discharge this sub-goal.

It is worth noting that in general there are multiple equivalently appropriate combinations of arguments to prove a given inductive problem. For instance, the following proof snippet shows an alternative proof script for our example:

lemma alt_prf:"itrev xs ys = rev xs @ ys" apply(induct xs ys rule:itrev.induct) by auto

Here we passed the itrev.induct rule to the rule field of the induct method and proved the lemma by recursion induction 222Recursion induction is also known as functional induction or computation induction. over itrev. This rule was derived by Isabelle automatically when we defined itrev, and it states the following:

(!!ys. P [] ys) ==> (!!x xs ys. P xs (x # ys) ==> P (x # xs) ys) ==> P a0 a1

Essentially, this rule states that to prove a property P of a0 and a1 we have to prove it for two cases where a0 is the empty list and the list with at least two elements. When the induct method takes this rule and xs and ys as induction variables, Isabelle produces the following sub-goals:

!!ys. itrev [] ys = rev [] @ ys
!!x xs ys. itrev xs (x # ys) = rev xs @ x # ys ==> itrev (x # xs) ys = rev (x # xs) @ ys

where the two sub-goals correspond to the two clauses in the definition of itrev.

There are other lesser-known techniques to handle difficult inductive problems using the induct method, and sometimes users have to develop useful auxiliary lemmas manually; however, for most cases the problem of how to apply induction boils down to the the following three questions:

•

On which terms do we apply induction?

•

Which variables do we generalize?

•

Which rule do we use for recursion induction?

Isabelle experts resort to induction heuristics to answer such questions and decide what arguments to pass to the induct method; however, such reasoning still requires human engineers to carefully investigate the inductive problem at hand. Moreover, Isabelle experts’ induction heuristics are sparsely documented across various documents, and there was no way to encode their heuristics as programs. For the wide spread adoption of complete formal verification, we need a program language to encode such heuristics and the system to check if an invocation of the induct method written by an Isabelle novice complies with such heuristics. We developed LiFtEr, taking these three groups of questions as a design space.

3 Overview and Syntax of LiFtEr

We designed LiFtEr to encode induction heuristics as assertions on invocations of the induct method in Isabelle/HOL. An assertion written in LiFtEr takes the pair of a proof goal with its underlying proof state and arguments passed to the induct method. When one applies a LiFtEr assertion to an invocation of the induct method, LiFtEr’s interpreter returns a boolean value as the result of the assertion applied to the proof goals and their underlying proof state.

The goal of a LiFtEr programmer is to write assertions that implement reliable heuristics. A heuristic encoded as a LiFtEr assertion is reliable when it satisfies the following two properties:

The LiFtEr interpreter is likely to evaluate the assertion to True when the arguments of the induct method are appropriate for the given proof goal. 2. 2.

The LiFtEr interpreter is likely to evaluate the assertion to False when the arguments are inappropriate for the goal.

Fig. 1 illustrates the workflow of LiFtEr. Firstly, Isabelle experts encode the gist of promising applications of induction based on experts’ proofs. Note that the heuristics encoded in LiFtEr become applicable to problem domains that the experts users have not even encountered at the time of writing the assertions.

When new Isabelle users are facing an inductive problem and are unsure if their application of induction is a valid approach or not, they can apply LiFtEr assertions written by experts using the assert_LiFtEr keyword to their proof goal and their candidate arguments.

LiFtEr’s interpreter checks if the pair of new users’ proof goal and candidate arguments to the induct method is compatible with the experts’ heuristics. If the interpreter evaluates the pair to True, Isabelle prints “Assertion succeeded.” in the Output panel of Isabelle/jEdit [28]. If the interpreter evaluates the pair to False, Isabelle highlights the assert_LiFtEr in red and prints “Assertion failed.” in the Output panel.

Program 1 shows the essential part of LiFtEr’s abstract syntax. LiFtEr has four types of variables: number, rule, term, and term_occurrence. A value of type number is a natural number from [math] to the maximum of the following two numbers: the number of terms appearing in the proof goals at hand, and the maximum arity of constants appearing in the proof goals. A value of type rule corresponds to a name of an auxiliary lemma passed to the induct method as an argument in the rule field.

The difference between term and term_occurrence is crucial: a value of term is a term appearing in proof goals, whereas a value of term_occurrence is an occurrence of such terms. It is important to distinguish terms and term occurrences because the induct method in Isabelle/HOL only allows its users to specify induction terms but it does not allow us to specify on which occurrences of such terms we intend to apply induction.

The connectives, $\land$ , $\lor$ , $\neg$ , and $\rightarrow$ correspond to conjunction, disjunction, negation, and implication in the classical logic, respectively; and $\rightarrow$ admits the principle of explosion.

LiFtEr has four essential quantifiers and two quantifiers as syntactic sugar. As is often the case, $\forall$ quantifies over variables universally, and $\exists$ stands for the existence of a variable it binds. Again, it is important to notice the difference between the quantifiers over term and the ones over term_occurrence. For example, $\forall{}\_$ . $\in$ term quantifies all sub-terms appearing in the proof goals, whereas $\forall{}\_$ . $\in$ term_occurrence quantifies all occurrences of such sub-terms. Quantified variables restricted to induction_term by the membership function $\in$ are quantified over all terms passed to the induct method as induction terms, while quantified variables restricted to arbitrary_term are quantified over all terms passed to the induct method as arguments in the arbitrary field.

Some atomic assertions judge properties of term occurrences, and some judge the syntactic structure of proof goals with respect to certain terms, their occurrences, or certain numbers. While most atomic assertions work on the syntactic structures of proof goals, Pattern provides a means to describe a limited amount of semantic information of proof goals since it checks how terms are defined. Section 4 explains the meaning of important atomic assertions through LiFtEr’s standard heuristics.

Attentive readers may have noticed that LiFtEr’s syntax does not cover any user-defined types or constants. This absence of specific types and constants is our intentional choice to promote induction heuristics that are valid across various problem domains: it encourages LiFtEr users to write heuristics that are not specific to particular data-types or functions. And LiFtEr’s interpreter can check if an application of the induct method is compatible with a given LiFtEr heuristic even if the proof goal involves user-defined data-types and functions even though such types and functions are unknown to the LiFtEr developer or the author of the heuristic but come into existence in the future only after developing LiFtEr and such heuristic.

4 LiFtEr’s Standard Heuristics

This section presents LiFtEr’s standard heuristics and illustrates how to use those atomic assertions and quantifiers to encode induction heuristics.

4.1 Heuristic 1: Induction terms should not be constants.

Let us revise the first example lemma about the equivalence of two reverse functions, itrev and rev. One naive induction heuristic would be “any induction term should not be a constant” 333This naive heuristic is not very reliable: there are cases where the induct method takes terms involving constants and apply induction appropriately by automatically introducing induction variables. See Concrete Semantics [25] for more details. In LiFtEr, we can encode this heuristic as the following assertion 444For better readability we omit parentheses where the binding of terms is obvious from indentation.:

$\forall t1$ : term $\in$ induction_term. $\exists to1$ : term_occurrence. ( $to1$ term_occurrence_is_of_term $t1$ ) $\land$ $\neg$ ( is_constant $to1$ )

Note the use of quantifiers over induction_terms and term_occurrences: when LiFtEr handles induction terms, LiFtEr treats them as terms, but it is often necessary to analyze the occurrences of these terms in the proof goal to decide how to apply induction. In our example lemma, xs is a variable, which appears twice: once as the first argument of itrev, and once as the first argument of rev. With this in mind, the above assertion reads as follows:

for all induction terms, named $t1$ , there exists a term occurrence, named $to1$ , such that $to1$ is an occurrence of $t1$ and $to1$ is not a constant.

Now we compare this heuristic with the model proof by Nipkow et al.

The only induction term, xs, has two occurrences in the proof goal both as variables. Therefore, if we apply this LiFtEr assertion to the model proof, LiFtEr’s interpreter acknowledges that the model proof complies with the induction heuristic defined above.

Fig. 2 shows the user interface of LiFtEr. In the second line where the cursor is staying, LiFtEr’s interpreter executes the aforementioned reasoning and concludes that the model proof by Nipkow et al. is compatible with this heuristic, printing “Assertion succeeded.” in the Output panel. On the contrary, the fourth line applies the same heuristic to another possible combination of arguments to the induct method (induct itrev arbitrary: ys) and concludes that this candidate induction is not compatible with our heuristic because itrev is a constant. LiFtEr also highlights this line in red to warn the user.

It is a common practice to analyze occurrences of specific terms when describing induction heuristics. Therefore, we introduced two pieces of syntactic sugar to avoid boilerplate code: $\exists{}\_$ : term_occurrence $\in\_$ : term and $\forall{}\_$ : term_occurrence $\in\_$ : term. Both of these quantify over term occurrences of a particular term rather than all term occurrences in the proof goal at hand. Using $\exists{}\_$ : term_occurrence $\in\_$ : term, we can shrink the above assertion from 5 lines to 3 lines as follows:

$\forall t1$ : induction_term. $\exists to1$ : term_occurrence $\in t1$ : term. $\neg$ ( is_constant $to1$ )

In English, this reads as follows:

for all induction terms, named $t1$ , there exists an occurrence of $t1$ , named $to1$ , such that $to1$ is not a constant.

4.2 Heuristic 2. Induction terms should appear at the bottom of syntax trees.

Not applying induction on a constant would sound a plausible heuristic, but such heuristic is not very useful.

In this sub-section, we encode an induction heuristic that analyzes not only the properties of the induction terms but also the location of their occurrences within the proof goal at hand. When attacking inductive problems with many variables, it is sometimes a good attempt to apply induction on variables that appear at the bottom of the syntax tree representing the proof goal. We encode such heuristic using is_at_deepest as the following LiFtEr assertion:

$\forall t1$ : induction_term. $\exists to1$ : term_occurrence $\in t1$ : term. is_atomic $to1$ $\rightarrow$ is_at_deepest $to1$

In English, this assertion reads as follows:

for all induction terms, named $t1$ , there exists an occurrence of $t1$ , named $to1$ , such that if $to1$ is an atomic term then $to1$ lies at the deepest layer in the syntax tree that represents the proof goal.

We used the infix operator, $\rightarrow$ , to add the condition that we consider only the induction terms that are atomic terms. An atomic term is either a constant, free variable, schematic variable, or variable bound by a lambda abstraction. We added this condition because it makes little sense to check if the induction term resides at the bottom of the syntax tree when an induction term is a compound term: such compound terms have sub-terms at lower layers.

LiFtEr’s interpreter acknowledges that the model solution provided by Nipkow et al. complies with this heuristic when applied to this lemma: there is only one induction term, xs, and xs appears as an argument of rev on the right-hand side of the equation in the lemma at the lowest layer of this syntax tree.

4.3 Heuristic 3. All induction terms should be arguments of the same occurrence of a recursively defined function.

Probably, it is more meaningful to analyze where induction terms reside in the proof goal with respect to other terms in the goal. More specifically, one heuristic for promising application of induction would be “apply induction on terms that appear as arguments of the same occurrence of a recursively defined function”. We encode this heuristic using LiFtEr’s atomic assertions, is_recursive_constant and is_an_argument_of, as follows:

$\exists t1$ : term. $\exists to1$ : term_occurrence $\in t1$ : term. $\forall t2$ : term $\in$ induction_term. $\exists to2$ : term_occurrence $\in t2$ : term. is_recursive_constant $to1$ $\land$ $to2$ is_an_argument_of $to1$

where is_recursive_constant checks if a constant is defined recursively or not, and is_an_argument_of takes two term occurrences and checks if the first one is an argument of the second one.

Note that using is_recursive_constant this assertion checks not only the syntactic information of the proof goal at hand, but it also extracts an essential part of the semantic information of constants appearing in the goal, by investigating how these constants are defined in the underlying proof context. As a whole, this assertion reads as follows:

there exists a term, named $t1$ , such that there exists an occurrence of $t1$ , named $to1$ , such that for all induction terms, named $t2$ , there exists an occurrence of $t2$ , named $to2$ , such that $to1$ is defined recursively and $to2$ appears as an argument of $to1$ .

Attentive readers may have noticed that we quantified over induction terms within the quantification over $to1$ , so that this induction heuristic checks if all induction terms occur as arguments of the same constant.

The LiFtEr interpreter confirms that the model proof is compatible with this heuristic as well: the constant, itrev, is defined recursively and has an occurrence that takes the only induction variable xs as the first argument.

4.4 Heuristic 4.

One should apply induction on the nth argument of a function where the nth parameter in the definition of the function always involves a data-constructor.

The previous heuristic checks if all induction terms are arguments of the same occurrence of a recursively defined function. Sometimes we can even estimate on which arguments of such function we should apply induction by inspecting the definition of the function more carefully.

We introduce two constructs to support this style of reasoning: is_nth_argument_of and pattern_is. is_nth_argument_of takes a term occurrence, a number, and another term occurrence, and it checks if the first term occurrence is the $n$ th argument of the second term occurrence where counting starts at [math]. pattern_is takes a number, a term occurrence, one of three patterns: all_only_var, all_constructor, and mixed. Each of such patterns describes how the term is defined.

For example, pattern_is ( $n$ , $to$ , all_only_var) denotes that the $n$ th parameter is always a variable on the left-hand side of the definition of the term that has the term occurrence, $to$ . Likewise, all_constructor denotes the case where the corresponding parameter of the definition of a particular constant always involves a data-constructor, whereas mixed denotes that the corresponding parameter is a variable in some clauses but involves a data-constructor in other clauses. With these atomic assertions in mind, we write the following LiFtEr assertion:

$\neg$ ( $\exists r1$ : rule. True) $\rightarrow$ $\exists t1$ : term. $\exists to1$ : term_occurrence $\in{}t1$ : term. is_recursive_constant $to1$ $\land$ $\forall t2$ : term $\in$ induction_term. $\exists to2$ : term_occurrence $\in{}t2$ : term. $\exists n$ : number. pattern_is ( $n$ , $to1$ , all_constructor) $\land$ is_nth_argument_of ( $to2$ , $n$ , $to1$ )

This roughly translates to the following English sentence:

if there is no argument in the rule field in the induct method, then there exists a recursively defined constant, $t1$ , with an occurrence, $to1$ , such that for all induction terms $t2$ , there exists an occurrence, $to2$ , of $t2$ , such that there exists a number, $n$ , such that the $n$ th parameter involves a data-constructor in all the clauses of the definition of $t1$ , and $to2$ appears as the $n$ th argument of $to1$ in the proof goal.

Note that we added $\neg$ ( $\exists r1$ : rule. True) to focus on the case where the induct method does not take any auxiliary lemma in the rule field since this heuristic is known to be less reliable if there is an auxiliary lemma passed to the induct method.

LiFtEr’s interpreter confirms that Nipkow’s proof about itrev and rev conforms to this heuristic: there exists an occurrence of itrev, such that itrev is recursively defined and for the only induction term, xs, there is an occurrence of xs on the left-hand side of the proof goal, such that itrev’s first parameter involves data-constructor in all clauses of its definition, and this occurrence of xs appears as the first argument of the occurrence of itrev in the goal 555Note that in reality the counting starts at [math] internally. Therefore, “the first argument” in this English sentence is processed as the [math]th argument within LiFtEr..

4.5 Heuristic 5. Induction terms should appear as arguments of a function

that has a related .induct rule in the rule field.

When the induct method takes an auxiliary lemma in the rule field that Isabelle automatically derives from the definition of a constant, it is often true that we should apply induction on terms that appear as arguments of an occurrence of such constant.

See, for example, our alternative proof, alt_prf, for our ongoing example theorem. When Nipkow et al. defined the itrev function with the fun keyword, Isabelle automatically derived the auxiliary lemma itrev.induct, and the occurrence of itrev on the left-hand side of the equation takes xs and ys as its arguments. Furthermore, the alternative proof passes xs and ys to the rule field in the same order they appear as the arguments of the occurrence of itrev in the proof goal.

We introduce is_rule_of to relate a term occurrence with an auxiliary lemma passed to the rule field. is_rule_of takes a term occurrence and an auxiliary lemma in the rule field of the induct method, and it checks if the rule was derived by Isabelle at the time of defining the term. Moreover, we introduce is_nth_induction_term, which allows us to specify the order of induction terms passed to the induct method: is_nth_induction_term takes a term and a number, and it checks if the term is passed to the induct method as the $n$ th induction term. Using these constructs, we can encode the aforementioned heuristic as follows:

$\exists r1$ : rule. True $\rightarrow$ $\exists r1$ : rule. $\exists t1$ : term. $\exists to1$ : term_occurrence $\in t1$ : term. $r1$ is_rule_of $to1$ $\land$ $\forall t2$ : term $\in$ induction_term. $\exists to2$ : term_occurrence $\in t2$ : term. $\exists n$ : number. is_nth_argument_of ( $to2$ , $n$ , $to1$ ) $\land$ $t2$ is_nth_induction_term $n$

As a whole this LiFtEr assertion checks if the following holds:

if there exists a rule, $r1$ , in the rule field of the induct method, then there exists a term $t1$ with an occurrence $to1$ , such that $r1$ is derived by Isabelle when defining $t1$ , and for all induction terms $t2$ , there exists an occurrence $to2$ of $t2$ such that, there exists a number $n$ , such that $to2$ is the $n$ th argument of $to1$ and that $t2$ is the $n$ th induction terms passed to the induct method.

Our alternative proof is compatible with this heuristic: there is an argument, itrev.induct, in the rule field, and the occurrence of its related term, itrev, in the proof goal takes all the induction terms, xs and ys, as its arguments in the same order.

4.6 Heuristic 6. Generalize variables in induction terms.

Isabelle’s induct method offers the arbitrary field, so that users can specify which terms to be generalized in induction steps; however, it is known to be a hard problem to decide which terms to generalize.

Of course LiFtEr cannot provide you with a decision procedure to determine which terms to generalize, but it allows us to describe heuristics to identify variables that are likely to be generalized by experienced Isabelle users. For example, experienced users know that it is usually a bad idea to pass induction terms themselves to the arbitrary field. We also know that it is often a good idea to generalize variables appearing within induction terms if induction terms are compound terms.

We can encode the former heuristic using are_same_term, which checks if two terms are the same term or not. For instance, we can write the following:

$\forall t1$ : term $\in$ arbitrary_term. $\neg$ ( $\exists t2$ : term $\in$ induction_term. are_same_term ( $t1$ , $t2$ ))

By now, it should be easy to see that this assertion checks if the following holds:

for all terms in the arbitrary field, there is no induction term of the same term in the induct method.

The latter heuristic involves the description of the term structure constituting the proof goal. For this purpose we use is_in_term_occurrence to check if a term occurrence resides within another term occurrence. With this construct, we can encode the latter heuristic as follows:

$\exists t1$ : term $\in$ induction_term. $\exists to1$ : term_occurrence $\in t1$ : term. $\forall t2$ : term. $\exists to2$ : term_occurrence $\in t2$ : term. ( $to2$ is_in_term_occurrence $to1$ $\land$ is_free_variable $to2$ ) $\rightarrow$ $\exists t3$ : term $\in$ arbitrary_term. are_same_term ( $t2$ , $t3$ )

Again, we used the implication (_ $\rightarrow$ _) to avoid applying this generalization heuristics to the cases without compound induction terms.

4.7 Apply all assertions

using the test_all_LiFtErs command.

In this section we have written eight assertions (two assertions from each of Section 4.1 and Section 4.6). To exploit all the available LiFtEr assertions, we developed the test_all_LiFtErs command. The test_all_LiFtErs command first takes a combination of induction arguments to the induct method. Then, it applies all the available LiFtEr assertions to the pair of the combination of arguments and the proof goal at hand. Finally, it counts how many assertions return True. For example, the second line in Fig. 3 executed the eight available assertions to the combination of arguments ([on["xs"], arb["ys"], rule[]]) and the proof goal. The output panel shows the result: Out of 8 assertions, 8 assertions succeeded. This indicates that the model proof by Nipkow is indeed a good solution in terms of all the heuristics we discussed in this section.

5 Induction Heuristics Across Problem Domains

In Section 4 we wrote eight assertions in LiFtEr. When writing these eight assertions, we emphasized that none of them is specific to the data structure list or the function itrev appearing in the proof goal. In this section we demonstrate that the LiFtEr assertions written in Section 4 are applicable across domains, taking an inductive problem from a completely different domain as an example. The following code is the formalization of a simple stack machine from Concrete Semantics [25]:

type_synonym vname = string type_synonym val = int type_synonym state = "vname => val" datatype instr = LOADI val | LOAD vname | ADD type_synonym stack = "val list"

fun exec1 :: "instr => state => stack => stack" where "exec1 (LOADI n) _ stk = n # stk" | "exec1 (LOAD x) s stk = s(x) # stk" | "exec1 ADD _ (j#i#stk) = (i + j) # stk"

fun exec :: "instr list => state => stack => stack" where "exec [] _ stk = stk" | "exec (i#is) s stk = exec is s (exec1 i s stk)"

exec1 defines how the stack machine in a certain state transforms a given stack into a new one by executing one instruction, whereas exec specifies how the machine executes a series of instructions one by one. Nipkow et al. proved the following lemma using structural induction.

lemma exec_append_model_prf[simp]: "exec (is1 @ is2) s stk = exec is2 s (exec is1 s stk)" apply(induct is1 arbitrary: stk) by auto

This lemma states that executing a concatenation of two lists of instructions in a state to a stack produces the same stack as executing the first list of the instructions first in the same state to the same stack and executing the second list again in the same state again but to the resulting new stack. As in the case with the equivalence of two reverse functions, there is also an alternative proof based on recursion induction:

lemma exec_append_alt_prf: "exec (is1 @ is2) s stk = exec is2 s (exec is1 s stk)" apply(induct is1 s stk rule:exec.induct) by auto

where exec.induct is automatically derived by Isabelle when defining exec. Now we check if the heuristics from Section 4 correctly recommend these proofs.

Heuristic 1.

Both exec_append_model_prf and exec_append_alt_prf comply with this heuristic. For example, is1 is the only induction term in exec_append_model_prf, and it has occurrences in the proof goal, where it occurs as a variable.

Heuristic 2.

exec_append_model_prf complies with the second heuristic: its only induction term, is1, occurs at the bottom of the syntax tree as a variable, which is an atomic term. exec_append_alt_prf also complies with this heuristic: is1, s, and stk as the arguments of the inner exec on the right-hand side of the equation are all atomic terms at the deepest layer of the syntax tree.

Heuristic 3.

Both proof scripts comply with this heuristic. For example, the inner occurrence of exec on the right-hand side of the equation takes all the induction terms of the alternative proof (namely, is1, s, and stk) as its arguments.

Heuristic 4.

This heuristic works for both proof scripts, but it explains the model answer particularly well: it has a recursively defined constant, exec, and the inner occurrence of exec on the right-hand side of the equation has an occurrence that takes the only induction term is1 as its first argument, and the first parameter of exec always involve a data-constructor in the definition of exec.

Heuristic 5.

This heuristic also works for both proof scripts, but it fits particularly well with the alternative answer: the rule exec.induct is derived by Isabelle when defining exec, while exec has an occurrence as part of the third argument of another exec on the right-hand side of the equation, and this inner occurrence of exec takes all the induction terms (is1, s, and stk) in the same order.

Heuristic 6.

None of our proofs involve induction on a compound term, making the second assertion in Section 4.6 rather irrelevant, whereas the first assertion in Section 4.6 explains the model answer well: the only generalized term, stk, does not appear as an induction term.

6 Real World Example

In Section 4 and Section 5, we introduced simple LiFtEr assertions applied to smaller problems. For example, all induction terms in the examples were variables, even though Isabelle’s induct method can induct on non-atomic terms.

Program 2 is a more challenging proof about a formalization of an imperative language, IMP2 [14], from the Archive of Formal Proofs [12]. Due to the space constraints, we refrain ourselves from presenting the complete formalization of IMP2 but focus on the essential part of the proof document.

In this project, Lammich et al. proved the equivalence between IMP2’s big-step semantics and small-step semantics. smalls_seq in Program 2 is an auxiliary lemma useful to prove the equivalence. The proof of smalls_seq appears to be somewhat similar to that of alt_prf in Section 2 and exec_append_alt_prf in Section 5: smalls_seq’s proof uses the auxiliary lemma small_steps.induct, which Isabelle derived automatically when Lammich et al. defined small_steps. Furthermore, the three induction terms, $\pi$ , (c, s), and Some (c’, s’), are the arguments of one occurrence of small_steps.

The difference from the preceding examples is the generalization of four free variables appearing in induction terms: in Program 2, c and s appear within (c, s), while c’ and s’ appear within Some (c’, s’). As we discussed in Section 4.6, when applying induction on non-atomic terms in Isabelle/HOL it is often a good idea to generalize free variables appearing within such non-atomic induction terms.

To encode such heuristic, we strengthened Example 5 in Section 4 using the is_in_term_occurrence assertion. Program 3 checks if any induction term is non-atomic and contains a free variable, all such free variables are generalized in the arbitrary field. Note that LiFtEr’s interpreter evaluates the universal quantifier over $to3$ to True when all induction terms are atomic, since $to3$ term_occurrence_is_of_term $t3$ is guarded by $\neg$ ( is_atomic $to2$ ), making this assertion valid even for the cases where induction terms are atomic variables.

7 Conclusions, Related and Future Work

ITP has been considered as a very challenging task. To address this issue, we presented LiFtEr. LiFtEr is a domain-specific language in the sense that we developed LiFtEr to encode induction heuristics; however, heuristics written in LiFtEr are often not specific to any problem domains. To the best of our knowledge, LiFtEr is the first programming language developed to capture induction heuristics across problem domains, and its interpreter is the first system that executes meta-reasoning on interactive inductive theorem proving.

The recent development in proof automation for higher-order logic takes the meta-tool approach. Gauthier et al., for example, developed an automated tactic prover, TacticToe, on top of HOL4 [5]. TacticToe leanrs how human engineers used tactics and applies the knowledge to execute a tactic based Monte Carlo tree search. To automate proofs in Coq [27], Komendantskaya et al. developed ML4PG [13]. ML4PG uses recurrent clustering to mine a proof database and attempts to find a tactic-based proof for a given proof goal. Both of them try to identify useful lemmas or hypotheses as arguments of a tactic; however, they do not identify promising terms as arguments of a tactic even though identifying such terms is crucial to apply induction effectively.

The most well-known approach for ITP is called the Boyer-Moore waterfall model [17]. This approach was invented for a first-order logic on Common Lisp. Most waterfall provers attempt to apply six proof techniques (simplification, destructor elimination, cross-fertilization, generalization, elimination of irrelevance, and induction) in a fixed order, store the resulting sub-goals in a pool, and keep applying these techniques until the pool becomes empty.

ACL2 [18] is the most commonly used waterfall model based prover, which has achieved industrial-scale proofs [10]. When deciding how to apply induction, ACL2 computes a score, called hitting ratio, to estimate how good each induction scheme is for the term which it accounts for and proceeds with the induction scheme with the highest hitting ratio [2, 19].

Compared to the hitting ratio used in the waterfall model, LiFtEr’s atomic assertions let us analyze the structures of proof goals directly while LiFtEr’s quantifiers let us keep LiFtEr assertions non-specific to any problem. While ACL2 produces many induction schemes and computes their hitting ratios, LiFtEr assertions do not directly produce induction schemes but analyze the given proof goal and the arguments passed to the induct method, re-using Isabelle’s existing tool to (implicitly) produce induction principles. We consider LiFtEr’s approach to be a reasonable choice, since it extends the usability of the already well-developed proof assistant, Isabelle/HOL, while avoiding to reinvent the mechanism to produce induction principle.

Furthermore, the choice of Isabelle/HOL as the host system of LiFtEr allowed us to take advantage of human interaction more aggressively both from Isabelle experts and new Isabelle users: Isabelle experts can encode their own heuristics since LiFtEr is a language, and new Isabelle users can inspect the results of LiFtEr assertions and decide how to attack their proof goals instead of following the fixed order of six proof techniques as in the waterfall model.

Heras et al. used ML4PG learning method to find patterns to generalize and transfer inductive proofs from one domain to another in ACL2 [8]. Jiang et al. followed the waterfall model and ran multiple waterfalls [9] to automate ITP in HOL light [7]. However, when deciding induction variables, they naively picked the first free variable with recursive type and left the selection of appropriate induction variables as future work.

To determine induction variables automatically, we developed a proof strategy language PSL and its default proof strategy, try_hard for Isabelle/HOL [23]. PSL tries to identify useful arguments for the induct method by conducting a depth-first search. Sometimes it is not enough to pass arguments to the induct method, but users have to specify necessary auxiliary lemmas before applying induction. To automate such labor-intensive work, PGT [24], a new extension to PSL, produces many lemmas by transforming the given proof goal while trying to identify a useful one in a goal-oriented manner.

The drawback of PSL and PGT is that they cannot produce recommendations if they fail to complete a proof search: when the search space becomes enormous, neither PSL and PGT gives any advice to Isabelle users.

PaMpeR [22], on the other hand, recommends which proof method is likely to be useful to a given proof goal, using a supervised learning applied to the Archive of Formal Proofs [12]. The key of PaMpeR was its feature extractor: PaMpeR first applies 108 assertions to each invocation of proof methods and converts each pair of a proof goal with its context and the name of proof method applied to that goal into an array of boolean values of length 108 because this simpler format is amenable for machine learning algorithms to analyze. The limitation of PaMpeR is, unlike PSL, it cannot recommend which arguments in the induct method to tackle a given proof goal.

Taking the same approach as PaMpeR, we attempted to build a recommendation tool, MeLoId [21], to automatically suggest promising arguments for the induct method without completing a proof: we wrote many assertions in Isabelle/ML. Unfortunately, encoding induction heuristics as assertions directly in Isabelle/ML caused an immense amount of code-clutter, and we could not encode even the human-friendly notion of depth in syntax tree since multi-arity functions are represented as curried functions in Isabelle. Therefore, we developed LiFtEr, expecting that LiFtEr serves as a language for feature extraction.

We hope that when combined into the supervised learning framework of MeLoId, assertions written in LiFtEr extract the essence of induction in Isabelle/HOL in a cross-domain style and produce a useful database for machine learning algorithms, so that new Isabelle users can have the recommendation of promising arguments for the induct method in a fully automatic way.

Bibliography28

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Blanchette, J., Kaliszyk, C., Paulson, L., Urban, J.: Hammering towards qed. Journal of Formalized Reasoning 9 (1), 101–148 (2016). https://doi.org/10.6092/issn.1972-5787/4593
2[2] Boyer, R.S., Moore, J.S.: A computational logic handbook, Perspectives in computing, vol. 23. Academic Press (1979)
3[3] Bundy, A.: The automation of proof by mathematical induction. In: Robinson, J.A., Voronkov, A. (eds.) Handbook of Automated Reasoning (in 2 volumes), pp. 845–911. Elsevier and MIT Press (2001)
4[4] Delahaye, D.: A tactic language for the system Coq. In: Logic for Programming and Automated Reasoning, 7th International Conference, LPAR 2000, Reunion Island, France, November 11-12, 2000, Proceedings. pp. 85–95 (2000), https://doi.org/10.1007/3-540-44404-1_7 · doi ↗
5[5] Gauthier, T., Kaliszyk, C., Urban, J.: Tactic Toe: Learning to reason with HOL 4 tactics. In: Eiter, T., Sands, D. (eds.) LPAR-21, 21st International Conference on Logic for Programming, Artificial Intelligence and Reasoning, Maun, Botswana, May 7-12, 2017. E Pi C Series in Computing, vol. 46, pp. 125–143. Easy Chair (2017), http://www.easychair.org/publications/paper/340355
6[6] Gramlich, B.: Strategic issues, problems and challenges in inductive theorem proving. Electr. Notes Theor. Comput. Sci. 125 (2), 5–43 (2005), https://doi.org/10.1016/j.entcs.2005.01.006 · doi ↗
7[7] Harrison, J.: HOL light: A tutorial introduction. In: Formal Methods in Computer-Aided Design, First International Conference, FMCAD ’96, Palo Alto, California, USA, November 6-8, 1996, Proceedings. pp. 265–269 (1996), https://doi.org/10.1007/B Fb 0031814 · doi ↗
8[8] Heras, J., Komendantskaya, E., Johansson, M., Maclean, E.: Proof-pattern recognition and lemma discovery in ACL 2. In: Logic for Programming, Artificial Intelligence, and Reasoning - 19th International Conference, LPAR-19, Stellenbosch, South Africa, December 14-19, 2013. Proceedings. pp. 389–406 (2013)