Re: [isabelle] Proof of concept: BNF-based records

Hi all. I'm the guy from Sydney who added stuff to the record package.
I'm just back from holiday, and I should try to clear up some

- 1 - It's OK for other packages to use the record update syntax.

I think the BNF record extension mechanism is fine. I think (but haven't
checked) that it's the record update syntax which is being used. This is
just syntax for any constant whose name ends in "_update". You can play
with this by hand, if you want, define the constant spam_update
and use the syntax "r (| spam := 1 |)". This won't create conflicts with
the record package, since the record package knows which constants
it "owns".

- 2 - I changed the record package, but I tried to avoid "features".

The history, as I recall, is that the extensible record mechanism is 
quite old.
Norbert Schirmer's Simpl package tends to require large records, so
he adjusted the implementation. He managed to support about 100 variables,
and then I adjusted it again to get up to 1000.

My adjustment made a lot of changes to how the record type is defined
internally. But I tried to avoid changing the "surface layer" which users
interact with. I think Makarius is slightly wrong to describe this as 
new features,
it was meant to just be a performance upgrade within the machinery.

I did make a mistake. I didn't understand the code generator well, and 
in fact I
still don't. I had hoped at the time that my changes were benign, and 
would result
in code being generated fairly normally. I was wrong, apparently. Other 
subsequently added quite a lot of code to produce theorems that allow 
the code
generator to simulate a simpler definition for records. This conflicts 
with my
performance goals somewhat. With all these changes together, the package is
now quite complicated.

- 3 - The L4.verified performance constraints aren't complicated.

We can't "quote" the big record definition that is done, because it's 
called in ML by
another package. But we can describe the performance constraints very 
somewhere upwards of 700 variables. I think that our most challenging 
might be up well about 1000 now.

There's one key big record for any given inclusion of a C program, i.e. 
one in any
image. It's a flat record, not an extension and never extended.

Quoting myself on this list from nearly 10 years ago, you can run test 
an n-variable
record definition quite easily:

ML {*
fun add_test_rec b n = let
     fun var i = (Binding.suffix_name (string_of_int i) b, @{typ bool}, 
   in Record.add_record {overloaded = false} ([], b)
     NONE (map var (1 upto n))

setup {* add_test_rec @{binding test_rec} 20 *}
setup {* add_test_rec @{binding test_rec_b} 200 *}

We also use a number of smaller records, and do use the extension 
feature, but
I don't think we're different to other users in any interesting way.

- 4 - Past performance problems.

We had a live discussion about record performance problems more than 5 years
ago. Around then we were having critical difficulties getting images to 
build on
pretty much any hardware. I tracked down the problem at the time. It was due
to both the code generator support for extra record features and a 
change to the
way proof terms were stored.

This problem was then solved in PolyML. When David Matthews added
shareCommonData as a feature that the GC called occasionally, the problem
went away. Now the relevant image builds are just slow. I can explain 
this in more
detail, if anyone really cares, but on our side we no longer need to do 
about it.

- 5 - Future maintenance.

The record package is now quite complicated. Various people would like 
to localise
it. Part of the problem is that it doesn't have a single author any 
more. I made a quick
attempt to do some localising in the past, but I quickly hit bits of the 
syntax translations
and simprocs that I avoided changing in the past and that I still don't 

There are probably many other agendas as well.


On 15/02/18 01:01, Makarius wrote:
> On 08/02/18 22:36, Lars Hupel wrote:
>> I'd like to demonstrate that it is quite simple to base records on BNFs.
>> The short story is: it takes about 160 lines of ML code and 50 lines of
>> syntax declarations to achieve almost (one big caveat, see below)
>> feature parity to the standard record package (sources attached).
> This is all very strange. No discussion whatsoever and now there is even
> a change on the Isabelle repository:
> The history of the official Isabelle record package is long and
> entangled. Initially its main idea was to support extensible records in
> a simple manner (see the paper by Naraschewski and Wenzel from 1998).
> Later, some guys from Sydney added many more features to make the
> package scalable for the L4.verified project -- that made the
> implementation also very complex. Thus it has become difficult to update
> and "localize" it. Moreover any connection to BNF datatypes (which are
> very resource hungry) is likely to bomb applications with many record
> fields (or record extensions).
> It is presently unclear to me, where the scalability features are used
> in practice. If L4.verified no longer requires that, we can probably
> remove 80% of the record package and then update it easily. If scalable
> records are still required, it will need more work to update.
> Can some Data61 people say more about their current applications of the
> record package?
>> Looking for your input on whether this is useful, and whether it belongs
>> into HOL-Library, the AFP, or the bin.
> Spontaneously, I would have said HOL-ex or the bin. By putting it in
> HOL-Library, you make certain claims of quality and sustained
> maintenance. I.e. the first posting I can imagine on isabelle-users:
> "I've found out about HOL-Library.Datatype_Records by accident -- there
> is no NEWS entry and no documentation whatsoever. That tool appears to
> be in conflict with the existing record package. Can anybody please fix
> this ASAP, such that I can use theories with regular records together
> with these new BNF records? Anyway, what is the roadmap? Can I expect
> new BNF records will eventually supersede old-fashioned typedef records?"
>> What is supported? Pretty much the old syntax:
>>    bnf_record ('a, 'b) foo =
>>      field_1 :: 'a
>>      field_2 :: 'b
>>    term "⦇ field_1 = 3, field_2 = () ⦈"
>>    term "foo ⦇ field_1 := y ⦈"
>> Also, record construction syntax is supported out of the box for
>> arbitrary single-free-constructor types:
>>    datatype x = X (a: int) (b: int)
>>    term "⦇ a = 1, b = 2 ⦈"
>> Update syntax for existing types can be installed like this:
>>    local_setup ‹BNF_Record.mk_update_defs @{type_name x}›
>>    term "(X 1 2) ⦇ b := 3 ⦈ = X 1 3"
> That is record notation with "make" and "update" operations on top of
> existing datatypes. It was actually the plan in 1996/1997, before the
> extensible record scheme emerged in 1998. We did not do this by fancy,
> but it was based on requirements from applications.
>> Why?
>> The longer reason: there are more problems with records, e.g. there is
>> no support for nested recursion (they do not constitute BNFs), and even
>> "copy_bnf" can't fix the absence of a suitable "size" function (which
>> makes termination proofs impossible).
> Can you explain this further? Is there an inherent problem of the
> general idea of the regular record package wrt. BNFs? Can the BNF
> experts comment on that?
>> Why not?
>> The internal construction of standard records is much faster. Half of
>> the 50 lines to declare syntax is required to un-declare record syntax.
>> Unfortunately raw "syntax" rules can't be bundle-d. Importing the theory
>> breaks all existing record syntax.
> BTW, there is an old Isabelle development principle to be monotonic wrt.
> existing tools. A fork and clone is bad enough, one that disrupts what
> is already there is worse: entropy, decay.
>> What is the motivation for this?
>> Lem supports records. CakeML uses them for some core types, and
>> specifies nested recursion through them. Whenever CakeML updates, I can
>> either change the generated sources to include a bunch of magic
>> incantations to make termination proofs work, or I can change Lem to
>> emit "bnf_record" instead of "record".* As a third option (somewhere
>> between band-aid and proper solution), Jasmin has already suggested a
>> few months to port the size plugin to records. I attempted that but
>> unfortunately it took me too much time and ultimately I didn't manage
>> to. This here was much quicker to pull off (approx. 3 h) and fixes an
>> immediate problem I had. (Jasmin did however prevent me from writing a
>> plugin that generates update functions for all datatypes.)
> This hints at various discussions in the dark. We have relatively little
> traffic on the isabelle-users and isabelle-dev mailing lists, so why not
> use them for proper discussions?
> As a reminder:
>    * isabelle-users is for anything that is relevant to users of official
> Isabelle releases (past, present, future releases).
>    * isabelle-dev is for anything concerning the ongoing development
> process of Isabelle repository versions between releases.
>    * It does not make sense to cross-post on both lists; isabelle-dev is
> the smaller list, essentially a subset of isabelle-users.
> Mailing lists are not for real-time "chats". Often the relevant people
> are not immediately reactive (e.g. busy somewhere else or even on
> vacation). This means it needs more than just a few days to draw
> conclusions from the absence of postings.
> 	Makarius

This archive was generated by a fusion of Pipermail (Mailman edition) and MHonArc.