Re: [isabelle] Isabelle2013-2-RC3 ATP e_par acid test



On Wed, 4 Dec 2013, Gottfried Barrow wrote:

The short story is that I can reliably get non-terminating processes (or processes that can't be shut down, and that will run for a long time), but so far, I can only do that with the ATP e_par, and when it's started with a bunch of other ATPs.

This is the first time I learn about the existence of e_par. It seems to be an add-on script to E prover by Josef Urban. Maybe Jasmin can say something about the degree of supportedness within the Sledgehammer suite.

My general impression is that there are many experimental add-ons that are not widely used nor tested on the usual platforms.


I didn't spend a lot of time experimenting with the command I show next, which doesn't use e_par, but it uses most every ATP that's available to me, and I haven't seen any problems from prematurely terminating the Sledgehammer command. This is the command:

  ML {* Multithreading.max_threads_value () *}
  ML {* Multithreading.max_threads := 300 *}
  ML {* Multithreading.max_threads_value () *}

  theorem "False = True"
  sledgehammer [minimize=smart,preplay_timeout=10,timeout=60,verbose=true,
                isar_try0=true,isar_proofs=smart,provers="
     remote_vampire  metis  remote_satallax  z3_tptp  remote_e  smt
     remote_e_tofof  spass  remote_e_sine    e        z3        yices

remote_snark remote_iprover remote_z3 remote_waldmeister spass_poly remote_cvc3 remote_agsyhol remote_leo2 remote_iprover_eq leo2 cvc3
  "]
  oops

From here, two key points are that e_par starts up 12 eprover.exe processes regardless of what max_threads is set to, that the number and order of ATPs run with it affect things, and "max_thread := 4" is what sets up the particular sequence I list below.

300 threads are quite a lot, but I suggested to go towards and beyond the limits of what is feasible :-)

The e_par implementation ($E_HOME/runepar.pl) has its own process management and allocation of CPU resources. It does not take the number of ML threads into account. Looking a bit through the script, I see various potential weaknesses and non-portabilities. E.g. there is ulimit, which only works reliably on various brands of Unix, not the generic quasi-POSIX of Isabelle that includes Cygwin. It is also unclear if its internal signal handling is really robust (e.g. the Cleanup does not wait
for the process farm.)


A general note on Multithreading.max_threads: changing that in Isabelle/ML might lead to some surprises, because the PIDE protocol will update its value after some change of properties, using the value that is given in Isabelle/Scala, which is the same that you see in Isabelle/jEdit in the dialog "Plugin Options / Isabelle / General / Threads".

ML {* Multithreading.max_threads_value () *} is fine to check what Isabelle/ML is presently using.


I give the Isar command below, but I work it like this:

1) I start the command.
2) Lots of eprover processes will be started and terminated, several times.
3) At about 40 seconds using my CPU, the ATP "leo" will be running with 12 eprover.exe processes, during a time when processes aren't being shut down and started (shown by green and red lines using http://systemexplorer.net/ )

Does that mean you have a local installation of leo? I've never looked at it myself, and don't know where to get it. It might have once again its very own process management done in OCaml.

4) I delete the "r" in "Sledgehammer".
5) The processes keep running much long than what I'm willing to wait. (I have waited up to 15 minutes to see if they would terminate, and they didn't.) 6) I exit the PIDE. Processes are terminated except for the ATP related processes. I attach a screen shot showing the many non-terminating processes.

We've had that already: switching off the light with the hammer --- by shutting down the outermost PIDE application --- does not really help. It can leave processes behind at the bottom of the implicit hierarchy.

POSIX processes are relatively difficult to control systematically. One needs to look at particular implementations, to see that they are within reason of fork, wait, signal handling etc.


This is no problem for me. The ATP e_par actually terminates much of the time, and I haven't been using it because of the large number of processes that it starts.

Indeed. It seems to be an experiment to allocate as many CPU cores as feasible, with different strategies for E tried in parallel.


	Makarius




This archive was generated by a fusion of Pipermail (Mailman edition) and MHonArc.