Re: [isabelle] mergesort (was: BUG in conv.ML: rewr_conv and then_conv fail due to beta-normalization issue)

On Wed, 24 Oct 2012, Christian Sternagel wrote:

On 10/23/2012 08:17 PM, Makarius wrote:
(Next time I will tell a story how an efficient and fully verified
merge-sort function included in the core sources caused several days of

Please tell the story.

I will try, but it has to be shorter than usual: I've just returned from travel and I am preparing for the next departure starting in 1.5 days.

I'm particularly interested since in the near future I was going to suggest to run some tests whether it would make sense (performance-wise) to replace the existing sorting algorithm in the Isabelle sources by the one from (sorry for the self-advertisement):

The website seems to be down at the moment, and it does not look like a publicly available thing anyway (without paying extra).

As I'm writing this email I saw that the current sorting algorithm in src/Pure/library.ML actually already is a very similar mergesort (last time I checked, it wasn't ;)).

You probably mean this version and not "the current" which is hardly defined for an ever growing set of approx. 50000 elements. Its changelog reads "replaced quicksort by mergesort, which might be a bit more efficient for key operations like Ord_List.make, Sorts.minimize_sort;". This was the most positive formulation I could find after spending too much time with the issue to gain almost nothing.

The code has become a bit longer, a tiny little bit better-defined in what it does, but we also had to unearth a much older sort function to accomodate the HO unification code with its special demands, see

When you proposed to use your verified version of mergesort (derived from the Haskell library) some month ago, I was not very excited for formal software-quality reasons. Things deep down there that have been used for a long time in certain ways follow their own requirements. Formal specification and formal proofs are fine, but not yet good software engineering. Too much "eating your own dogfood" is bad for health.

Since I am usually a bit too open to experimentation, when Stefan Berghofer came with some measurements about a hot-spot in the management of sorts in the inference kernel, we gave it a try nonetheless. One result was no measureable performance improvement, see the log entry quoted above. Another result was the discovery that the Unify module by Larry from 25 years ago does not really "sort" where it invokes a function called "sort", but something else. The "quasi order" it gives here is not transitive, resulting in funny results with the proven mergesort.

Just curious: Is there a specific reason for not using the function composition "trick" to avoid 'rev' in 'ascending'?

IIRC, the original version copied from Haskell was composing many functional closures according to the length of the list. This might be efficient in Haskell (I have no substantial experience with it), but in the conventional evaluation model of ML it is not. Plain rev is not very expensive, and there is no reason to avoid it as "premature optimization". David Matthews is quite smart to manage the heap for us, so you don't have to be afraid of some temporary allocations.


This archive was generated by a fusion of Pipermail (Mailman edition) and MHonArc.