Re: [isabelle] Parallelism in Isabelle 2009-1

On Fri, 9 Apr 2010, Thomas Sewell wrote:

The first problem is that the L4.verified proof repository I work on is huge (tens of thousands of lemmas and hundreds of thousands of lines of proof). This exposes a problem with parallel proof checking. CPU usage looks good to begin with but then our systems very rapidly run out of memory and start thrashing. I suspect the problem here is that for every proof saved for later checking the relevant context needs to saved as well. Our contexts get quite large, so the theory-parsing thread doesn't have to get far ahead of the proof-checking threads for memory to run out. The solution to this is to turn off parallel proof checking.

In a low memory situation, you have two options to play with:

  * Multithreading.max_threads (usedir option -M)
  * Goal.parallel_proofs (usedir option -q)

See also section 3.4 in the Isabelle System manual for some further explanations. In particulay, -q1 vs. -q2 might sometimes help.

Since your application is rather big, I would say the real solution is to install more memory. As a rule of thumb, you need to scale up memory with the number of cores. Less than 2 GB per core is unlikely to work here. Since the total heap size of Poly/ML is limited to 2-3 GB on 32bit, you might have to switch to 64bit by editing the ML settings accordingly. There is an example in Isabelle2009-1/etc/settings. Multiply the memory by 1.5 or even 2 for 64bit mode.

Furthermore, although Isabelle 2009-1 loads theories in parallel this does not seem to translate to CPU parallelism. It is clear from the log chatter that multiple theories are being loaded at once, however, when parallel proofs are disabled CPU usage remains at one core's worth. Running UNIX top and showing thread decompositions (press "H") reveals that there are multiple polyml threads running but their CPU usage always adds to exactly one core. It looks a lot like there's a lock somewhere all the threads need to run. This prevents me getting any parallel execution on my work machine.

The degree of parallelism in typical theory development graphs is surprisingly low. Nonetheless, you should get some speedup, unless the graph is all linear.

If CPU usage is really just 100% there might be some ML tools using something like setmp_CRITICAL internally. The official Isabelle distribution is essentially free from such exclusive execution, but maybe you have your own tools. Or maybe it is just some critical wrapper around the main use_thys invocation in ROOT.ML of the session (only relevant for batch mode).

Finally, on my machine at home, which seems to act up in the context of parallelism, I can't disable it. I can turn off parallel proof checking and set the maximum thread count to 1. UNIX top confirms that this reduces the number of polyml threads created from 7 to 2, which then share the CPU between them in exactly the same way. This prevents me disabling real parallelism on my home machine. I can still run a busy loop on the other core, which seems to hide the mysterious issues.

I have just tried ISABELLE_USEDIR_OPTIONS="-M 1" on my 2-core laptop and it works as expected: there are only 2 threads, the main ML thread (typically 80% CPU time) and the GC thread (20% CPU time). Total CPU usage is < 50% accounting for the 2 cores, i.e. this is really single-core mode. The ML thread and GC thread are alternating each other.


This archive was generated by a fusion of Pipermail (Mailman edition) and MHonArc.