Re: [isabelle] Isabelle Foundation & Certification



To get beyond the black-and-white of the Inconsistency Club vs. Consistency
Club, here are some more shades and dimensions.

The subsequent classification of reliability and robustness of proof
assistants follows the tradition of the "LCF-approach", which has two main
ingredients:

  (1) a programming language with strong type-safety properties (the
  "meta-language" ML)

  (2) an implementation of the logical core as abstract datatype thm
  within ML

The key ML modules should be reasonable small, to make them obviously
correct (whatever that means precisely). Arbitrary derived tools may be put
on top of that, without breaking things: all inferences going through the
thm module are "correct-by-construction".

Both ingredients of the LCF approach are equally important. E.g. a
fully-formalized and proven implementation of the thm module that is exposed
to an unsafe ML environment still leads to an unreliable system, if the user
is able to poke around arbitrarily.


To better capture the "True LCF-nature" of proof assistants 35 years after
LCF, lets consider qualities in following categories:

  [small] overall size of critical code modules.

  [ML] The implementation/extension language of the prover.

  [thm] The actual inference kernel, to derive theorems within a given
  theory.

  [defs] additional guarantees for "definitional theories", whatever that
  means precisely.

  [context] treatment of formal context (theory or proof context with formal
  declarations, additional tool data etc.).

  [logic] well-understood (by experts) and easy to understand (by users).

  [formalized] key aspects are formally specified or even proven (the logical
  calculus, models of the prover, the actual prover implementation etc.).

The rating uses small natural numbers in unary notation: 0, +, ++, +++. More
pluses mean somehow better. The scale is open-ended, but I will not go
beyond +++ right now. 0 means neither good nor bad, just zero.

The results below are based on my own understanding of these systems -- I've
studied sources of all of them. The architects and engineers of the
mentioned systems are invited to improve on these estimates.


** LCF **

I only know the Cambridge LCF code base, by L. Paulson from around 1985.
Original LCF is hard to get by -- M. Gordon once showed a stack of printed
LISP + ML listings of it.

  [small]      +++
  [ML]         +++
  [thm]        +++
  [defs]       0
  [context]    0
  [logic]      ++
  [formalized] 0

[small] + [ML] + [thm] define the game of "LCF-style proof assistant".
Original ML is embedded into LISP, and there is no system access, no
non-sense like mutable strings or funny holes in the type-system.

[defs] No definitions; the LCF logic is used axiomatically.

[context] management is implicit in ML; there is no way to switch contexts
nor go back to older contexts.

[logic] LCF as a logic is theoretically nice and clean, but not so easy to
understand for regular users.


** HOL88 **

HOL88 is like LCF, with the addition of checked definitional principles as
part of the kernel. Thus it goes beyond the original "LCF-approach" in a
significant way. Everything else is similar to LCF.

  [small]      +++
  [ML]         +++
  [thm]        +++
  [defs]       +++
  [context]    0
  [logic]      ++
  [formalized] 0

[logic] Note that I don't give +++ for the logic here. The original "Simple
Type Theory" is really nice and simple, especially in the exposition by W.
Farmer, "The seven virtues of STT"
http://www.sciencedirect.com/science/article/pii/S157086830700081X

In contrast, Gordon-HOL adds the important concepts of parametric
polymorphism and typedefs, together with considerable complexity. A proof of
degraded simplicity is the historic misunderstanding about extra type
variables in the body of definitions that don't occur in the type of the
defined term. (In Isabelle-jargon this is called "hidden polymorphism", and
is causing horror and shudder to many veterans.)


** HOL4 **

HOL4 is the current incarnation of the HOL-line after HOL88, starting with
HOL90. It uses a self-hosting SML platform, instead of ML inside LISP.

  [small]      ++
  [ML]         ++
  [thm]        +++
  [defs]       +++
  [context]    0
  [logic]      +++
  [formalized] +

[small] less small than the pioneering implementations, but still within
reason.

[ML] I am giving only ++, since HOL4 resides in the raw toplevel of a
standalone SML implementation. This allows to do various system-level
non-sense, e.g. use interrupt signals (CTRL-C on TTY) or threads to disrupt
the internal state of some modules (which expect strictly sequentional
execution).

[thm] + [defs] at its full height, especially after the formal treatment in
http://www.cl.cam.ac.uk/~rk436/itp14.pdf

[context] is implicit in the ML environment (hidden mutable references).
Implementation not thread-safe; global assumption of sequentialism.
Otherwise as in LCF.

[logic] I am giving +++ instead of ++ for HOL88, since the formal treatment
has lead to various clarification of obscure details in the original book
chapters by A. Pitts in the HOL88 book.

[formalized] See again http://www.cl.cam.ac.uk/~rk436/itp14.pdf and more
recent ongoing work. Looking forward to see results applied to actual HOL4,
or rather HOL5?



** HOL-Light **

This is the famous fork from the HOL family by John Harrison.

  [small]      +++
  [ML]         +
  [thm]        +++
  [defs]       +++
  [context]    0
  [logic]      ++
  [formalized] +

[small] very small size of key modules.

[ML] HOL-Light uses OCaml instead of SML. OCaml is much less safe than SML,
e.g. strings are mutable, ints overflow at unspecified word size without any
exception, various other oddities and deviations from original ML make it
hard to understand mathematically (e.g possibility for circular datatypes).
Signals allow to inject arbitrary ML exceptions into user code, not just
interrupts. (I am not taking Obj.magic non-sense into account here, since
its absence can be easily detected by looking at the source.)

There is still one + instead of 0, since OCaml is safer than C/C++.

[thm], [defs], [context], [logic], [formalized] similar to HOL4.


** Isabelle/HOL **

Here "Isabelle/HOL" refers to the main product that end-users experience
when downloading Isabelle. Its sophisticated architecture based on Isabelle
as a framework for building logic-based tools needs to be kept in mind, when
trying understand it thoroughly.

  [small]      +
  [ML]         +++
  [thm]        ++
  [defs]       +
  [context]    +++
  [logic]      +
  [formalized] 0

[small] The assembly of Isabelle/ML modules is quite substantial just for
Isabelle/Pure, and more is added in the Isabelle/HOL library. Nonetheless,
there are certain principles of building up in stages to manage complexity.

[ML] extra-safe SML, since we do our own management of the ML compiler and
ML environments, including proper treatment of signals and threads in the
run-time system. Isabelle/ML could in principle do more to isolate user
code from raw system access and other potential non-sense, but that would
mean ++++.

[thm] roughly like in the HOL family, but extra features are hardwired into
the inference kernel, such as higher-order unification or type-class
inferences. It is also not immediately clear, which modules contribute to
the thm implementation and which not.

[defs] constant definitions are fully checked (in Isabelle2014 and
Isabelle2015). HOL typedefs are not fully unchecked and thus "axiomatic" in
the terminology of Isabelle: users are responsible to refrain from non-sense
with overloading involving type constructors.

[context] Type theory and Proof.context serve as immutable containers for
logical reasoning, independently of implicit assumptions about time and
space. This provides structural integrity that is not present in other
LCF-style proof assistant. E.g. it is possible to reason in parallel in
different contexts undisturbed. Going back ("undo") to older states is
trivial: in fact the user is working simultaneous in many different
contexts, when editing a theory document.

[logic] only one +, since type-classes add considerable complexity. The
traditional idea is that this is just a front-end to plain HOL, with a
hypotheticla dictionary-construction, or something else still to be
discussed.

[formalized] Nothing substantial yet. I don't expect a verified Isabelle
implementation anytime soon, due to the sheer size and complexity of
critical modules. It would be nice to transfer results from other HOL
groups, e.g. by exporting to an independent checker in CakeML, OpenTheory
etc.


** Coq **

Coq is strictly speaking not an LCF-style nor HOL-style system, but G. Huet
and L. Paulson were working together on Cambridge-LCF around 1985, and there
are undeniable family traces.

  [small]      +
  [ML]         +++
  [thm]        ++
  [defs]       ++
  [context]    +
  [logic]      +
  [formalized] +

[small] critical Coq modules are about as big as in Isabelle -- I usually
estimate it to be slightly bigger than Isabelle, but this is open to
discussion.

[ML] OCaml is used, but in a more sophisticated manner than in HOL-Light, to
make up for the inherent weaknesses of the language. E.g. there are proper
integers (not machine words) and proper strings (not mutable byte arrays).
Coq also manages its own byte-code interpreter, and more.

[thm] Not an LCF-style abstract datatype, but separate checking, potentially
by an external program. At the level of abstraction of these notes, it is
difficult to relate this to the other systems.

[defs] Serious checking of definitions, with serious complexity built-in.

[context] Just one context for theory and proofs. More explicit than in
classic HOL family, but less versatile than in Isabelle. Tool context is
swapped in-and-out on a single global instance.  Not thread-safe.

[logic] Truely advanced and powerful, but hard to capture by experts and
users.

[formalized] There for various formalizations, and a general culture to
build tools within the formal system itself.


Overall, I see just one big club of major ITP systems, to move gradually
forwards in the quest for approximative soundness of computer-implemented
formal reasoning. The bigger and more popular systems tend to be less
puristic. The minimalistic and more pure systems tend to be less popular.

Note that I consider am myself a minimalist and purist, but we have
side-conditions imposed by big applications, to provide a robust and
scalable environment, not just a formally correct core.


	Makarius




This archive was generated by a fusion of Pipermail (Mailman edition) and MHonArc.