[isabelle] Creating ML data is slow. No sharing/caching?



Dear ML and Isabelle experts,

I observed that my code written in Isabelle got horribly slow after a
small change. Currently, I evaluate my code with

  value[code]

Here is a small Isabelle/HOL example that illustrates the issue:

datatype 'a expr = Prim 'a | And "'a expr" "'a expr"

definition add_expr :: "'a expr â 'a expr list â 'a expr list"
  where "add_expr e es = map (Îe'. And e e') es"

1) Constructing a list of expressions is quite fast:
value[code] "map Prim [1..1000]" (*0.339s elapsed time, 0.678s cpu
time, 0.000s GC time*)

2) If I add a small expression to a large list of expressions, it is also fast:
value[code] "add_expr (Prim 42) (map Prim [1..1000])" (*0.604s elapsed
time, 1.201s cpu time, 0.026s GC time*)

3) Constructing one larger expression is also fast:
value[code] "fold And (map Prim [1..10]) (Prim 0)" (*too fast to give
a timing?*)

4) Adding bigger expressions to a list of expressions is more than 10
times slower than (2):
value[code] "add_expr (fold And (map Prim [1..10]) (Prim 0)) (map Prim
[1..1000])" (*5.853s elapsed time, 11.495s cpu time, 0.316s GC time*)

5) Almost the same result if the expression is written down explicitly:
value[code] "add_expr (And (Prim 10) (And (Prim 9) (And (Prim 8) (And
(Prim 7) (And (Prim 6) (And (Prim 5) (And (Prim 4) (And (Prim 3) (And
(Prim 2) (And (Prim 1) (Prim 0))))))))))) (map Prim [1..1000])"
(*5.715s elapsed time, 11.148s cpu time, 0.316s GC time*)

I played around with this, repeated these small measurements and
shuffled the value statements to make sure that we are not observing
some CPU turbo boost artifact here.

By the way, the performance doesn't seem to get worse If the list of
expressions consists of large expressions:
value[code] "add_expr (fold And (map Prim [1..10]) (Prim 0)) (map (Îi.
(fold And (map Prim [1..10]) (Prim i))) [1..1000])" (*5.291s elapsed
time, 6.138s cpu time, 0.366s GC time*)


Why is (4) so slow, compared to (2)?

How can I speed this up? Currently, this renders my theory unusable.

Doesn't ML use sharing or caching? I always assumed that, if I have
expressions e1 and e2, then constructing "And e1 e2" would reuse the
values e1 and e2 and this operation would be independent of the sizes
of e1 and e2.

Best,
  Cornelius




This archive was generated by a fusion of Pipermail (Mailman edition) and MHonArc.