Re: [isabelle] Creating ML data is slow. No sharing/caching?

Hi Cornelius,

Your problem is not the evaluation, it's the pretty printing. Evaluations 4 and 5 result in quite large terms, which have a fairly deep nesting of functions. The Isabelle/ML process sends to Isabelle/jEdit the whole syntax tree, which includes type annotations for every subterm and layout information (suitable positions for line breaks). All this information has to be parsed and type-set in the output buffer.

If you really want to measure the evaluation time, you should either make sure that your evaluations result in small terms or use the ML interface which avoids most of the pretty-printing overhead. For example,

  value [code]
    "let x = add_expr (fold And (map Prim [1..10]) (Prim 0)) (map Prim [1..1000]) in ()"

is pretty fast (note that HOL's let gets compiled to ML's let, so the let binding is always evaluated) and so is

  definition "four = add_expr (fold And (map Prim [1..10]) (Prim 0)) (map Prim [1..1000])"
  ML â @{code four} â

I do not know the details of the implementation in Isabelle/Scala, so I cannot tell why 2 is much faster than 4 in terms of pretty printing. My guess is that 2 consists of a long list of values, i.e., you have a fairly flat term structure and special notation for lists (which suppresses a lot of the auxiliary information passed along with the constants).

Hope this helps,

On 28/07/15 20:13, C. Diekmann wrote:
Dear ML and Isabelle experts,

I observed that my code written in Isabelle got horribly slow after a
small change. Currently, I evaluate my code with


Here is a small Isabelle/HOL example that illustrates the issue:

datatype 'a expr = Prim 'a | And "'a expr" "'a expr"

definition add_expr :: "'a expr â 'a expr list â 'a expr list"
   where "add_expr e es = map (Îe'. And e e') es"

1) Constructing a list of expressions is quite fast:
value[code] "map Prim [1..1000]" (*0.339s elapsed time, 0.678s cpu
time, 0.000s GC time*)

2) If I add a small expression to a large list of expressions, it is also fast:
value[code] "add_expr (Prim 42) (map Prim [1..1000])" (*0.604s elapsed
time, 1.201s cpu time, 0.026s GC time*)

3) Constructing one larger expression is also fast:
value[code] "fold And (map Prim [1..10]) (Prim 0)" (*too fast to give
a timing?*)

4) Adding bigger expressions to a list of expressions is more than 10
times slower than (2):
value[code] "add_expr (fold And (map Prim [1..10]) (Prim 0)) (map Prim
[1..1000])" (*5.853s elapsed time, 11.495s cpu time, 0.316s GC time*)

5) Almost the same result if the expression is written down explicitly:
value[code] "add_expr (And (Prim 10) (And (Prim 9) (And (Prim 8) (And
(Prim 7) (And (Prim 6) (And (Prim 5) (And (Prim 4) (And (Prim 3) (And
(Prim 2) (And (Prim 1) (Prim 0))))))))))) (map Prim [1..1000])"
(*5.715s elapsed time, 11.148s cpu time, 0.316s GC time*)

I played around with this, repeated these small measurements and
shuffled the value statements to make sure that we are not observing
some CPU turbo boost artifact here.

By the way, the performance doesn't seem to get worse If the list of
expressions consists of large expressions:
value[code] "add_expr (fold And (map Prim [1..10]) (Prim 0)) (map (Îi.
(fold And (map Prim [1..10]) (Prim i))) [1..1000])" (*5.291s elapsed
time, 6.138s cpu time, 0.366s GC time*)

Why is (4) so slow, compared to (2)?

How can I speed this up? Currently, this renders my theory unusable.

Doesn't ML use sharing or caching? I always assumed that, if I have
expressions e1 and e2, then constructing "And e1 e2" would reuse the
values e1 and e2 and this operation would be independent of the sizes
of e1 and e2.


This archive was generated by a fusion of Pipermail (Mailman edition) and MHonArc.