# Re: [isabelle] rigorous axiomatic geometry proof in Isabelle

Makarius, I don't care for the way I wrote last time, and it took me
this long to come up with a nicer way to say it, which is:

I'm trying to use my math skills to get into the proof assistant game.
I know all of you are much better than I am at CS, and I would think
that you would all be better than I at mathematical logic as well.
And maybe you are, but I think I understand something that nobody ever
writes, so I wonder if folks do understand this.

Mathematical logic studies what you can and can't prove axiomatically
in first order logic (FOL). The Goedel Completeness Theorem
http://mathworld.wolfram.com/GoedelsCompletenessTheorem.html says
truth in every model implies an axiomatic proof, and the Goedel
Incompleteness Theorem says you can't write a computer program that
will print out all theorems of a model of the natural numbers:
http://mathworld.wolfram.com/GoedelsFirstIncompletenessTheorem.html
There's a simpler related result I don't know the name of, which says
that given any nice set of FOL axioms, you can write a computer
program to print out all the theorems that follow from them,
i.e. theorems which have axiomatic FOL proofs using these axioms.
Mathematicians basically restrict themselves to the FOL ZFC set theory
axioms, and so we can write a computer program that will print out (if
it runs indefinitely) every theorem that mathematicians well ever
prove (unless they adopt new axioms).  Turing's proof of the Halting
problem (you can't write a infinite-loop checker) is just a
restatement of the Goedel incompleteness theorem.  The Lambda Calculus
comes out of math logic, as it gives a definition of a recursive
function that's equivalent to Turing-machine definable.

The whole point of proof assistants is to make this math logic work!
Nobody's gonna run the simple program that prints out all the theorems
that follow from the FOL ZFC axioms and wait for a good theorem to pop
up.  But you folks have written lots of smart proof assistants which
do great work on today's incredibly fast machines.

So what do proof assistants (Coq, Isabelle, HOL Light) do?  I would
assume they all start with some FOL axioms and then deduce axiomatic
FOL proofs as one discusses in math logic.  I contend that the proof
assistants must do that, because (by math logic) they can't do
anything else!  And the mathematicians can't do anything else either!

But I'm puzzled because almost nobody talks this way in the proof
assistant world.  Mizar (and Freek's miz3 port to HOL Light) seems
reasonably close to FOL axiomatic proofs.  Although I don't know
exactly how close.  I have this vague idea that Mizar does axiomatic
FOL proofs with a lot of the tedium automated away, e.g. manually
substituting variables.  But I don't know.  The things that folks
actually do---HOL, Isar, Pure---I don't understand.  I would need to
have them explained in terms of Goedel's field of math logic,
i.e. axiomatic FOL proofs. So I propose that

if someone knows the answer, they tell me, and if nobody knows the
answer, let's run a thread on math logic which will eventually explain
how HOL, Isar, Pure etc relate to Goedel's FOL.  I have very little to
contribute to such a thread beyond the note on the LC Y combinator and
Goedel incompleteness below.

--
Best,
Bill

We can understand Odifreddi's [p 80--1] startling remark that the Y
combinator "embodies the argument used in Russell's paradox" from
Boolos & Jeffries [Ch 14-15].

That is, I'll show how the Lambda Calculus Y combinator comes from
Goedel Incompleteness, and how Goedel I sort of comes from Russell's
paradox.  In particular, B&J's treatment of the Goedel fixed point
theorem is much clearer than Barendregt's [Thm (Goedel), sec 6.7].

******** Goedel's fixed point thm => Y combinator ********

For any expression B(y) of number theory, B&J [Lem 2, p 173] show
there exists an expression F(x) such that for any expression M(x),

Q |-   F( #M ) <--> B( #M(#M) )

where Q is the minimal axiom set for number theory [B&J Ch 14], s.t. a
partial function N ---> N is representable in Q iff it's
recursive.  I'm denoting Goedel numbers by #.

Then letting G = F(#F) and plugging in, we have Goedel's result

Q |-  G <--> B( #G )

Well, that gives us the Y combinator, just take out the #s and the
Qs.  We want

(F M) = (B (M M))

for all M, so we let

F = \m. B (m m)
G = F F
then
G = B G

giving us the fixed point for B, which we encode as the Y combinator

Y = \b. (\f. f f) (\m. b (m m))

That's the only reasonable motivation I've ever seen for the Y
combinator.  And maybe that explains that LC_v requires LC: as you
say, LC_v is for programming, LC is for Math logic.  But we need Y in
order to be able to derive the harder Y_v.

******** Goedel fixed point thm => Goedel I ********

Let E be the expression of number theory, E_0 the closed expressions,
or statements.  Let Th(N) be the statements that are true in the
standard model of N.  We'll write, for statements f in E_0,

|= f

if f in Th(N).

Goedel proved that Th(N) is not decidable, meaning there is no
expression B(x) in E such that for any statement f in E_0,

|=  f   if   Q |-  B(#f)
|= -f   if   Q |- -B(#f)

But this statement implies the more tractable sounding

|= f   iff   |= B(#f)

since provable in Q implies true in N.  So we show this statement is
false, which shows the previous statement is false too.

Let G = F(#F) be the "fixed point" for -B (which means not B'), so

Q |-  G = F( #F ) <--> -B( #F(#F) ) = -B( #G )

so pushing from Q to Th(N) and combining with the above, we have

|= G	iff    |=  -B( #G )  iff |= -G

a contradiction.  Hence Th(N) is undecidable.

********  Russell's paradox => Goedel I  ********

We also have that for any expression M(x),

|= F( #M )  iff  |= -B( #M(#M) )  iff |= -M(#M)

The statement F( #M ) means F is satisfied by the Goedel number of M.
Let's translate F( #M ) into set theory as M \in F.   Then we have

M \in F   iff   M \notin M

That is, F is the "set" F = { M : M \notin M }.  That's the Russell

F \in F   iff   F \notin F
or
G   iff   not G

And we can even see where F & G came from.  Aping Russell, we'd like
to define a formula F(x) such that for any M(x),

F is satisfied by #M iff #M does not satisfy M.

Well, we can do that if Th(N) is decided by B(y), we "define"

F(#M) = -B( #M(#M) )

Of course it's not clear that such an expression F(x) exists.  Let's
recall B&J's argument, since I don't quite understand it.

Recall \exists is the Tex symbol for "there exists", the backwards E.

Then B&J define

diagonalization: E ---> E

by diagonalization(M) = \exists_x ( x = #M & M )

So if N = N(x), i.e. x is the only free variable in N, then

Q |-  diagonalization(N) <--> N(#N)

Note the privileged position of the variable x here.  Then B&J show
[Lem. 1, p 172] that diagonalization is representable in Q (don't
understand the proof yet), meaning that there's an expression A(x,y)
in E such that for any expression M in E,

Q |- \exists_y ( A(#M, y) <--> y = #diagonalization(M) )

Then we define the expression F(x) by

F = \exists_y ( A(x,y) & B(y) )

so that for any M,

Q |-   F(#M) <--> B( #diagonalization(M) )

Then letting G = diagonalization(F), we have

Q |-  G  <--> F(#F) <--> B( #diagonalization(F) ) = B( #G )

Why not set-theorize Goedel's fixed point theorem?  For any B(y),

Q |-   F( #M ) <--> B( #M(#M) )

I guess that's

M \in F   iff (M \in M) \in B

so F is the "set"

F = { M : (M \in M) \in B }

Hmm, looks pretty fishy...  I can't figure out how to make sense out
of that.  Or, how to translate into number theory Cantor's argument

Given any set X and any function h: X ---> 2^X, h is not onto.

Proof: Let R = { x \in X : x \notin h(x) }.  Then we claim R \in 2^X
is not in the image of h.  If it were, say R = h(y), then we have a

y \in R    iff    y \in h(y)    iff    y \notin R

\qed

Sandy thought that thinking of 2^X as truth functions on X rather than
subsets of X might get us somewhere, but I don't see anything yet...

For another hazy analogy, Odifreddi says on p 81

Set Theory		  Lambda Calculus
element 		  argument
set			  function
membership		  application
set formation		  lambda abstraction
set equality		  term equality

Then he says that \x. N (x x) is the LC analogue of the Russell set
{ x : x \notin x }
for any term N "that is never the identity".  Pretty shaky...

So maybe we should give Goedel some credit for making the translation
of Russell's paradox to number theory :-D

********************************************************

Sandy Zabell pointed me to Goedel's original argument in Nagel &
Newman's book "Goedel's Proof".  I was so proud to have discovered
Russell's paradox in Boolos & Jeffries's proof, but N&N make the
connection quite clear.  They say that Goedel pointed out that he was
following Richard's parodox, which was concocted in 1905 by a guy
Jules Richard I never heard of, but it goes like this:

Enumerate the formulas M(x) of LA as F_n(x), and ask whether F_n(n) is
true, which is like the question X \in X. Then let R(n) be the
statement -F_n(n), which is like the Russell set
R = { X : X \notin X}.
Assume (here's the contradiction) that R(n) is given by one of our
formulas, so that R(x) = F_m(x), for some m, and then

F_m(m) = R(m) = -F_m(m)

which is Richard's paradox.  Here's a less cryptic account (to me)
than N&N's of how Goedel might have modified Richard's paradox to
prove that any consistent axiomatizable theory T containing Q is
incomplete.  I actually end up with different conclusions than N&N and
Goedel, so I'd be interested in you reading this.

LA is the 1st order language of Arithmetic.  A formula M of LA is
written M(x) if FV(M) = {x}.  Given a number n \in N, the standard
model of LA, they write M(n) for what in the LC we'd call M[x<-n],
which doesn't require x to be free in M, and this is generalized in LA
as
M(n) := \exists_x. x = n & M

After inventing Goedel numbers, Goedel could ask whether M(#M) is
false, for any formula M(x), and that sounds like we want

R(#M) = -M(#M)

but there is no formula R(x) of LA which gives this.  However, he
replaced this with the more subtle statement

R(#M) is true in N   iff    M(#M) is not provable in T

which is the same as

|= R(#M) 	     iff    #M(#M) \notin #T

and then produced R(x) by showing that there's a total recursive
function d: N ---> N such that for any M(x),

d(#M) = #M(#M)

and that recursive means representable in T (or even better, Q), and
that #T is recursively enumerable.   Let's go through this:

B&J [Lem. 1, p 172] explain the function d quite well, why it's
represented in Q by a formula D(x,y) of LA, that

Q |- \exists_y ( D(#M, y) <--> y = #diagonalization(M) )

where M(#M) is generalized to formulas M not of the form M(x) by

diagonalization(M) = \exists_x ( x = #M & M )

Now we need a formula C(y) of LA s.t. for any closed formula M,

|= C(#M)    iff    M \notin T

This follows from r.e. theory, but I haven't seen it written down
anywhere.  N&N's argument here is cryptic to me.

As B&J [pf. of Thm. 5, p 177] show, T being axiomatizable implies that
#T is is recursively enumerable, meaning there's a a total recursive
function f: N ---> N with Image(f) = #T.  Then as B&J prove [Ch 14],
there's a formula A(x,y) s.t.

f(n) = k  iff   Q |- \forall_y . A(n,y) <-> y = k

Then we claim that B(y) = \exists_x A(x,y) defines Image(f) subset N
in the sense that

|= B(k)    iff    k \in  image(f)

Proof:

|= B(k)  =>  there exists n \in N s.t.  |= A(n,k)

Since f is a total function, f(n) = l for some l \in N, but then k = l
is provable in Q.  The other direction is easy.
\qed

Now let f and A(x,y) represent #T, and let C(y) = -B(y), so we have

|= C(k)    iff    k \notin  #T

so for any closed formula M, we have

|= C(#M)    iff    M \notin T

Now we can define R(x), since we've shown the 2 pieces of R are
representable in Q.  Our desired relation is now

|= R(#M)    iff    |= C( #M(#M) )

and we achieve this by

R(x) = \exists_y . D(x, y) & C(y)

Then we get our Russell/Richard/Goedel contradictory sentence by

G = diagonalization(R)

and we have

Q |-  G  <--> C( #G )

as follows:

Q |-   G <--> R(#R)

Q |-   R(#M) <--> C( #diagonalization(M) ),   for any formula M of LA,

Q |-  G  <--> C( #diagonalization(R) ) = C( #G )

As before, I claim that we should think of this as the path to the
Lambda Calculus Y combinator.  We note that the argument works for any
C, and it looks like

R(#M)  = C( #M(#M) )
G = R(#R)
G = C( #G )

and in LC this makes perfect sense and gives a fixed point for C by
stripping the pound signs:

let  R = (lambda m. C (m m))
let  G = (R R)
then G = (C G)

That's the only good motivation for the LC Y combinator I know of

Y = (lambda c. (lambda r. (r r)) (lambda m. c (m m)))

TLS makes a good try to motivate Y_v, but I don't quite buy it.

Well, on to the contradiction.  Since T contains Q we have

T |-  G  <--> C( #G )

T |- G iff T |- C( #G ) implies |= C( #G ) iff G \notin T

T |- G    implies    G \notin T

Therefore, since T is assumed to be consistent, T cannot prove G,
i.e. T |- G must be false.  On the other hand, G is true in N, since

|= G  iff |= C( #G ) iff G \notin T

and we just showed that G \notin T.  So  T is strictly smaller than
Th(N).

Now N&N claim to prove that

T |- G    iff  T |- -G

and I don't how they can prove that, since T |- C( #G ) doesn't tell
us anything without passing to |= C( #G ).  I've based my treatment on
what I could glean from their account.  To me,

\forall z. - Dem(z,x)

means x isn't provable in T and that's where my C comes from, but it
has to take place in N, so I think they're gapping.

Well, N&N isn't giving Goedel's argument anyway, because they say that
Goedel actually proved

T |- G    implies    T |- -G

T |- -G   implies    LA is omega-inconsistent,

Hmm...

Bill

For completeness, here is my earlier derivation.

********  Russell's paradox => Goedel I  ********

We also have that for any expression M(x),

|= F( #M )  iff  |= -B( #M(#M) )  iff |= -M(#M)

The statement F( #M ) means F is satisfied by the Goedel number of M.
Let's translate F( #M ) into set theory as M \in F.   Then we have

M \in F   iff   M \notin M

That is, F is the "set" F = { M : M \notin M }.  That's the Russell

F \in F   iff   F \notin F
or
G   iff   not G

And we can even see where F & G came from.  Aping Russell, we'd like
to define a formula F(x) such that for any M(x),

F is satisfied by #M iff #M does not satisfy M.

Well, we can do that if Th(N) is decided by B(y), we "define"

F(#M) = -B( #M(#M) )

Of course it's not clear that such an expression F(x) exists.  Let's
recall B&J's argument.

Recall \exists is the Tex symbol for "there exists", the backwards E.

Then B&J define

diagonalization: E ---> E

by diagonalization(M) = \exists_x ( x = #M & M )

So if N = N(x), i.e. x is the only free variable in N, then

Q |-  diagonalization(N) <--> N(#N)

Note the priveleged position of the variable x here.  Then B&J show
[Lem. 1, p 172] that diagonalization is representable in Q, meaning
that there's an expression A(x,y) in E such that for any expression M
in E,

Q |- \exists_y ( A(#M, y) <--> y = #diagonalization(M) )

Then we define the expression F(x) by

F = \exists_y ( A(x,y) & B(y) )

so that for any M,

Q |-   F(#M) <--> B( #diagonalization(M) )

Then letting G = diagonalization(F), we have

Q |-  G  <--> F(#F) <--> B( #diagonalization(F) ) = B( #G )

I now think that Felleisen's argument in TLS [p 157-9] for the
nonsolvability of Halting Problem isn't actually a proof.  I still
like his argument, it's the 1st thing I ever understood about the
connection between Goedel I & Russell's paradox.  I think it's quite
suitable for TLS, and you don't claim that you've proven the theorem.
However...

He proves that there exists no Scheme function will-stop? that returns
#t or #f on a function f depending on whether (f '()) halts or not.
Then you give a nice Russell-ish argument,

(define (last-try x)
(if (will-stop? last-try)
(eternity x)))

(last-try '()) halts  <=> (last-try '()) doesn't halt

But that doesn't prove the Halting Problem is unsolvable, because we
can't perform any computation on Scheme expressions in Scheme.  I
think that's true, anyway.

Moving from LC_v to LC, Turing etc. proved that computable functions
on N are lambda-definable, but I don't think computable functions on
Lambda are definable in LC.  That's why they use the kludge of Goedel
numbering

#: Lambda ---> N

to handle computations on Lambda, and then the #-fixed point theorem

F [#X] = X

Using this, Dana Scott showed [Hankin, ch 6] that

{X in Lambda : X has a beta-nf}

is undecidable, and that's essentially the nonsolvability of Halting
Problem.  That is, there's no computable function h: N ---> {0,1} such
that h^{-1}( 1 ) is the image

{X has a beta-nf} subset Lambda -#--> N

I think it would be interesting to port these nonsolvability theorems
to LC_v, can you tell me where that was done?  And even better, is
there some way of jazzing up LC_v to handle computations on Lambda?
Maybe use Lambda as Data Constants?  I know that Scheme has functions
that would allow you to do some computations on Lambda, like symbol? &
procedure?.   But I still think no matter how rich Scheme is, that
Scheme won't capture computations on Scheme expressions.

*****************************************************

I finally understand Odifreddi's cryptic derivation

Maybe Odifreddi's argument was badly translated from Italian, or maybe
he didn't understand it himself.  I suspect Curry understood it,
because Y is sometimes called "Curry's paradoxical combinator".

I'm really happy with this argument, as it avoids the Goedel numbers
that came up in the previous derivation via Goedel I.

The proof of Russell's paradox (basically due to Cantor) can be
generalized as follows.  See below for why this is a generalization.

Let X and A be sets, and N: A ---> A be a function with no fixed
point.  Let A^X be the set of functions from X to A.  Then Cantor
showed that X is smaller A^X, and in fact his proof shows that for any
function

h: X ---> A^X

we can construct an element R \in A^X that's not in the image of h.
Thus, no function h is onto, and that's what smaller means.

Our function h is "adjoint" to a function

h': X \times X ---> A	       h'(y,x) = h(y)(x)

and then we can define the function R: X ---> A by

R(x) = N( h'(x,x) )

If we assume h is onto, we get the diagonal argument counterexample:
assume R = h(y) for some y \in X, then

R(x) = h'(y,x)  for all x \in X, but then

R(y) = N( h'(y,y) ) = h'(y,y)

and so h'(y,y) is a fixed point of N, and that's a contradiction.

Well, the contrapositive of this argument gives a technique to produce
fixed points!  I wouldn't have thought of it before reading it in
Odifreddi, but it makes sense.  Suppose we think N: A ---> A really
has a fixed point, and we have a function

k: X \times X ---> A

Let's define as above the function R: X ---> A by

R(x) = N( k(x,x) )

and let's suppose we can show that for some y \in X, we have

R(x) = k(y,x)  for all x \in X.

Then we know that N has a fixed point.

And furthermore k(y,y) is a fixed point for N, since

R(y) = N( k(y,y) ) = k(y,y)

Now for the LC Y combinator.  Let A = X be the set Lambda/= of lambda
expressions modulo alpha & beta equivalence, and application gives us
a function

k: X \times X ---> X

(U,V) |--> U V

Given N \in X, we have a function

N: X ---> X

U |--> N U

Now we look at the function R: X ---> X by

R(x) = N (x x)

and we can see that for

y = \x. N (x x)

we have

R(x) = k(y,x)  for all x \in X

So the Russell/Cantor argument above tells us that

y y = k(y,y) is a fixed point for N,

and that's the LC Y combinator,

\n. (\y. y y) (\x. n (x x))

Isn't that nice!
\qed

There's no need to modify this derivation of Y to give Y_v of LC_v,
because Y_v is the port of Y to LC_v, once we understand where the
formula for Y came from, we don't have to wonder about where Y_v came
from, in case we aren't satisfied by the derivation of Y_v in TLS :)

BTW I note that your partial-derivation of Y_v in TLS comes
immediately after your partial-proof of the nonsolvability of the
Halting Problem, so I could easily believe you knew this argument, but
presented something more suitable for the young in TLS.

Russell's paradox is really Cantor's theorem that any set X is smaller
than its power set 2^X.  In fact, given any function

h: X ---> 2^X

we can construct an element R \in 2^X that's not in the image of h:

R = { x \in X : x \notin h(x) }

Recall the contradiction: assume R = h(y) for some y \in X, then

y \in R    iff    y \notin h(y)    iff   y \notin R

But instead let's think of 2^X as the set of functions from X to
{T,F}, and we have the function

-: {T,F} ---> {T,F}

that switches T and F, then R \in 2^X is the function

x |--> - h(x)(x)

We see this translation of R because the correspondence between the
power set and function is

f: X ---> {T,F}      |-->      f^{-1}(T) subset X

So

- h(x)(x) = T    iff    x \notin h(x)^{-1}(T) subset  X

Now we have the usual diagonalization argument:

R = h(y) means

- h(x)(x) = R(x) = h(y)(x)		for all x \in X

- h(y)(y) = h(y)(y)    \in       {T,F}

To really get Russell's paradox, if X is the nonsensical set of all
sets, then 2^X is a subset of X.  After all, any subset of X is a set,
hence an element of X.  So instead of a function

h: X ---> 2^X

we might as well take the composite

X -h--> 2^X subset X

and we have such a function, the identity function.  Then

x \notin h(x)

translates to

x \notin x

and we have the usual Russell set

R = { x \in X : x \notin x }

and y = R, so the paradox is

R \in R    iff    R \notin R

And this last paradoxical sentence has a flavor that's shared by your
TLS argument

(define (last-try x)
(if (will-stop? last-try)
(eternity x)))

(last-try '()) halts  <=> (last-try '()) doesn't halt

even if last-try isn't really analogous to R.  But it's a start, and I
was really thrilled to see your argument, it was the 1st thing I'd
ever understood about the proof of Goedel I.  But now in Sandy
Zabell's class I went through the proof of the nonsolvability of the
Halting Problem as well as Goedel I and I was able to squeeze a much
stronger connection between Russell's paradox and Goedel I/Halting
from Boolos & Jeffries, which I already posted, but I only worried
about it to deduce Y, which we now see comes directly from Russell's
paradox.  Hard to say my Goedel efforts were wasted, though, it's a
nice theorem :)  But I'd sure like to get the Goedel numbers out of the
Scott's undecidability of

{X in Lambda : X has a beta-nf}

`

This archive was generated by a fusion of Pipermail (Mailman edition) and MHonArc.