Re: [isabelle] Isabelle Foundation & Certification

Here is now my virtual review of the ITP paper behind this thread. It came out a bit longer than anticipated, although I included only half of the thoughts that came to me, while reading it 3 times.


Title: A Consistent Foundation for Isabelle/HOL
Authors: OndÅej KunÄar, Andrei Popescu

Overall evaluation: accept
Reviewer's confidence: high


The paper revisits the question how overloaded definitions (of constants and
type constructores) may be understood within the HOL logic. This work
consists of three parts:

  (1) Technical treatment of the HOL logic, with a notion of "definitional
  theory" and its well-formedness, interpretation of this extended HOL logic
  in a substantially reformed version of the original HOL88 interpretation
  due to A. Pitts.

  (2) A proposed change to the Isabelle2014 code base, to upgrade its
  'typedef' mechanism to perform the additional well-formedness checks
  described in the paper. The text refers to which provides some
  changesets and one page of explanations how this fits into the setup of
  the paper.

  (3) General motivation and discussion of perceived problems of
  Isabelle/HOL, which the paper sets out to resolve. This part is apt to
  cause confusion in various ways, both to users and non-users of Isabelle.
  The style of presentation considerably subtracts from this otherwise fine

Part (1)

These are sections 3, 4, 5 in the paper. Section 5 is the main contribution.
The classic HOL logic of HOL88 (due to M. Gordon and semantics by A. Pitts)
is taken as starting point. There is a nice explanation in 4.3, why
overloaded definitions of constants or types does not work out in the
interpretation of A. Pitts (HOL88 book from 1993). Thus a new interpretation
is introduced as follows:

  * Types are not immediately interpreted away into sets, but preserved as
  syntactic entities to keep sufficient structure that allows well-founded
  recursion. The paper calls this "a natural syntactic-semantic blend, which
  avoids stunt performance in set theory".

  * Type interpretation starts out on ground types. Polymorphism is
  interpreted only in the final stage, treating it as strictly schematic
  over all ground interpretations.

  * Models for ground interpretation are constructed in stages, following
  the well-founded dependency relation of the given collection of
  definitions. Here types and constants are interpreted simultaneously for
  each stage, while in the HOL88 semantics by A. Pitts all types are
  interpreted before all constants.

This model construction is fit together with other recent work by the first
author [19], opening the perspective of fully checked definitional theories
in Isabelle/HOL. More about this in part (2).

Taking the little space of a paper in LNCS proceedings into account, the
technical treatment of the problem is worked out very carefully. Numerous
fine points and additional side-conditions are included in the

Nonetheless, there is a potential for confusion due to slight deviation of
terminology and concepts, compared to papers on the subject e.g. by
Wenzel/Haftmann and the Isabelle documentation (isar-ref manual).

In particular: Section 3.4 defines a "definitional theory" as a set of
overloaded definitions (for constants or types) such that certain *local*
syntactic conditions hold. Section 4.2 defines a "well-formed definitional
theory" as a definitional theory with well-founded dependency relation as
*global* condition. Section 4.5 establishes the main theorem that a
well-formed definitional theory is consistent -- by demonstrating that it
has a model in the above sense, and concluding that False cannot be derived
in the inference system -- due to soundness of the interpretation.

In contrast, the specifications of a "definitional theory" of the paper
would be called in the Isabelle documentation "unchecked definitions",
"definitional axioms", or "axiomatizations", to emphasize the missing aspect
of global well-formedness that this paper is mainly about. A "well-formed
definitional theory" of the paper would be just called "definitional theory"
in Isabelle terminology, and much stronger properties intended than just
consistency. Instead of merely ensuring the existence of a model, the
requirement is to preserve derivability of old propositions precisely, and
allow to reduce new derivations of new propositions into old derivations by
"expanding" definitions in some way. If and how this works depends on
fine-points of the underlying definitional mechanisms: for overloaded
constant definitions in Isabelle/Pure this works [13, 35], but for type
definitions the situation is a-priori quite different, e.g. see [13] section
4.3. The deeper reason for this are the distinctive categories of terms and
types in HOL, and the lack of equational reasoning on types.

A suitable semantic treatment is required to re-unify the view on terms and
types, as is done in the present paper. But this leads to weaker results.
The paper does briefly mention "a suitable notion of conservativeness" as
future work, but due to lack of further explanations, many readers will
probably miss key point behind this.

There is another aspect of "consistency" versus "conservativity" that is
easily missed in the paper. Section 3.5 states "Isabelle/HOL theory
development proceeds by: 1 ... 2 ... 3 ...". The three kinds of operations
ensure that every stage of theory development is "definitional" in the sense
of the paper -- and "well-formedness" can be established later. But what
happens when the user starts adding non-definitional axioms in between?
Strictly speaking, the main result does not apply and the whole theory
looses its good definitional properties.

In contrast, the traditional explanation of (constant) definitions in
Isabelle is more modular in that respect: given an arbitrary base theory ---
one that somehow makes sense to the user in a particular application, even
after making unchecked axiomatizations --- definitions merely introduce
names for existing entities and thus preserve the key properties of the base
theory. This agnostic approach of "definitional theory extensions"
(occording to Isabelle terminolgy) is particularly important from the
perspective of the Isabelle/Pure framework, where the interpretation of
object-logics is not yet known.

For HOL applications one might argue that genuine axiomatic extensions are
not done in practice these days. Nonetheless some exotic applications like
HOLZF (by Obua) do exist. Without stronger results under which conditions
typedefs "make sense" or "are OK" (using words from the paper), such
ambitious users would have to revisit the whole model theory of HOL again.

Part (2)

The patch to Isabelle2014 is rather minimal: the existing infrastructure to
check overloaded constant definitions (going back to Isabelle2007, and
slightly improved by O. Kuncar [19] is generalized to cover type definitions
as well. So Isabelle/Pure is upgraded to provide a general service for
"definitional specification items" that are identified in the name space for
constants or types, but without looking at actual content.

While the explanations for the patch states "the situation is from a
technical (implementation) point view a little bit more complicated", the
outcome is actually simpler than in the paper. Being forced to strip away
accidental aspects of the HOL object-logic does occasionally have

For example, the delicate notion of "build-in types" vs. "non-built-in
types" is absent in the implementation. The notes on the patch provide some
explanations, why it works out nonetheless. It would be nice to see this
refinement applied to the main work of part (1), to trim it further down to
the very essence of symbolic specifications with schematic polymorphism and

The proposed change to Isabelle2014 is called "correction patch", as if
something would be broken that is fixed by the change. This misunderstanding
leads directly into the general discussion of this work below.

Part (3)

These are sections 1 "Introduction", section 2 "Related Work", section 6
"Conclusion", i.e. the important pieces that put the technical contribution
into proper perspective. This is what most people read, and what attendants
of a conference presentation who are busy with their e-mails or smart-phone
usually take home. Unfortunately, the story being told here do not quite fit
to Isabelle.

The misunderstanding already starts in the first paragraph, where
Isabelle/HOL is included into the "umbrella term" of "HOL-based provers".
HOL4, HOL-Light, ProofPower, HOL Zero are fine systems, but Isabelle/HOL is
not as closely related to them as the "HOL" name might suggest. In many
respects Isabelle is actually closer to Coq.

Technically, the key misunderstanding is the role of "the kernel" (in the
words of the paper). HOL88 as the predecessor of all the other HOL systems
pioneered an add-on to the original LCF kernel design to have checked
definitions as primitive rules. In contrast, Isabelle is closer to LCF in
separating logical inferences and a-priori unchecked (axiomatic) theory
specifications: there is no notion of definitions in the inference kernel of
Isabelle. Since this is a bit impractical for big applications, more and
more checks on theory content have been added over the years (e.g. 2004,
2005, 2007, 2010, 2014), but that process was never formally closed. And
this is what the paper is actually about: complete emulation of HOL88-style
definitions in Isabelle/HOL (by providing additional services in the
Isabelle/Pure framework).

We can now revisit the critical examples in the light of the Isabelle
architecture and tradition, instead of the one of HOL88.

Example 2 is presented as the main failure point to motivate this work, but
it is merely an illustration what the isar-ref manual, section 11.5 "Typedef
axiomatization" covers abstractly: 'typedef' is a certain axiom scheme of
Isabelle/HOL with some local sanity checks. There is no claim of it being
"definitional" (in Isabelle terminology, "well-formed definitional" in the
paper). So the proof of False is not a problem of the Isabelle inference
kernel, but the consequence of producing a malformed axiomatic theory.

A legitimate discussion would be about the chances of users making such
mistakes in practice. The results of the present paper certainly improve the
situation, but after applying the "correction patch" (i.e. the additional
feature of fully-checked typedefs) no known applications of Isabelle + AFP
are rejected. This is good, but it also indicates relatively low practical

Other interesting information that is missing: How often do type definitions
depend on open parameters due to overloading, notably type class parameters?
The original concept of overloading in Isabelle did not expect types to
depend on unresolved overloading at all: such closed type definitions could
be interpreted within the original setting of HOL88, after expansion of all
constant definitions. In recent years such applications have shown up
nonetheless, e.g. a type of polynomials that refers to a generic constant
"zero" (see src/HOL/Library/Polynominal.thy e.g. in Isabelle2014 or
Isabelle2015). This indicates that the extra expressive power of typedefs
over type class parameters occasionally matters.

It is good that the new explanation of overloading treats constants and
types uniformly, but there is also an investment into a new model theory
that deviates from the other HOL systems. I actually like the new
interpretation better than the one by A. Pitts, but without the mainstream
HOLs following this upgrade, a translation of Isabelle/HOL theories into
e.g. HOL Zero or OpenTheory for separate checking would mean that overloaded
typedefs are axiomatic specifications without any well-formedness checks at

The older Example 3 is presented somewhat dramatically "when Obua took over
the, he found that the overloaded were almost completely unchecked". Here is
a quote from the relevant documentation (Isabelle2004 or Isabelle2005, "ref"
manual, section 6.1.1 "Definitions"):

  Definitions are intended to express abbreviations. The simplest form of a
  definition is f == t, where f is a constant. [...]

  Isabelle makes the following checks on definitions: [...]

  These checks are intended to catch the sort of errors that might be made
  accidentally. Misspellings, for instance, might result in additional
  variables appearing on the right-hand side. More elaborate checks could be
  made, but the cost might be overly strict rules on declaration order, etc.

Why do users and especially power users tend to ignore documentation? There
are many more fine points about Isabelle that could be misunderstood in
similar ways; and not all of them are as explicitly documented.

A potential explanation why users of Isabelle might consider the system to
be "absolutely right" and "unbreakable" is that in everyday life serious
incidents are never encountered, even after several years of intensive use.
Thus the sense for the degree of reliability is lost. We desparately need
for fine-grained assesment of the overall quality of ITP systems than the
"Consistency Club" versus "Inconsistency Club" in the paper.

My impression is that the "Consistency Club" is presently empty: none of the
major ITP systems are absolutely well-defined and correctly implemented in
every respect. Note that HOL Light is often quoted is being really small and
formally well-understood, but it resides in the unsafe environment of the
OCaml toplevel: users need to refrain from playing dirty tricks in ML.

Formalizations of complete systems like CakeML + classic HOL look very
promising, but the community is not there yet. For big environments like
Isabelle or Coq full formalization appears futile to me. A better strategy
is to connect to some smaller and fully-proven proof checker. For the
present paper, this means the authors either need to formalize the extended
model theory of HOL with full overloading themselves, or convince the
colleagues [18] to do such an upgrade of their formalization.

Then we only need OpenTheory to follow suite, and a proper
Isabelle/OpenTheory implementation as constructive proof that Isabelle/HOL
actually fits under the same "umbrella" of HOL-based systems. Then we could
proceed to the next order of magnitude in overally reliability.

This archive was generated by a fusion of Pipermail (Mailman edition) and MHonArc.