Re: [isabelle] Isabelle2021-1-RC1 (veriT): Abnormal termination with exit code 14



Hi Eugene,

thank you very much for this detailed report!

As Makarius wrote on 2021-11-07, the abnormal termination was noticed some time ago and the veriT team recently released version 2021.06.1-rmx with the following change.

veriT now always returns with an error code 14 in case of a timeout
and 24 in case of a CPU timeout (when available).  This is independent
of the operating system.

Consistent behaviour across operating systems was necessary because of Isabelle's multi-platform nature. This new version was integrated in Isabelle/23a97a547a9e and special handling of the error code was implemented in Isabelle/634e2323b6cf.

So from Isabelle2021-1-RC2 on, Isabelle/Sledgehammer displays the usual "Timed out" message when veriT exits with a timeout.

Now, I am not very familiar with signal handling but you seem to fear some race may happen. I am adding Hans-Jörg, one main veriT contributor, in CC so he can take a look at your concern and may contact you off-list if needs be.

Thank you for helping testing Isabelle RCs!

Regards,
Martin

On 07.11.21 13:04, Eugene W. Stark wrote:
Running Isabelle2021-1-RC1 on Ubuntu 20.04, when using Sledgehammer I now frequently see output like
the following:

"cvc4": Timed out
"verit": Prover error:
Abnormal termination with exit code 14
"z3": Timed out
"e": Timed out
"spass": Timed out
"vampire": Timed out
"zipperposition": Timed out

The "Abnormal termination" message for verit seems to be new with recent Isabelle release candidates.
The "exit code 14" looks suspiciously like it really means "signal 14" (i.e. SIGALRM).
I looked in the veriT sources and I see (in src/src/utils/veriT-signal.c):

#ifdef CAPTURE_SIGNALS
static void
veriT_signal_handle(int signum)
{
	unsigned i;
	for (i = 0; i < stack_size(veriT_sigfun_stack); i++)
		stack_get(veriT_sigfun_stack, i)();
	switch (signum) {
		case SIGINT:
			if (veriT_print_report) {
				stats_fprint_formats(stdout);
				stats_fprint(stdout);
				fflush(stdout);
			}
			break;
		case SIGILL: fprintf(stderr, "Illegal Instruction\n"); break;
		case SIGFPE: fprintf(stderr, "Floating point exception\n"); break;
		case SIGSEGV: fprintf(stderr, "Invalid memory reference\n"); break;
		case SIGPIPE: fprintf(stderr, "Broken pipe\n"); break;
		case SIGALRM:
			fprintf(stderr, "Time limit exceeded\n");
			if (veriT_print_report) {
				stats_fprint_formats(stdout);
				stats_fprint(stdout);
				fflush(stdout);
			}

			break;
		case SIGXCPU:
			fprintf(stderr, "CPU time limit exceeded\n");
			if (veriT_print_report) {
				stats_fprint_formats(stdout);
				stats_fprint(stdout);
				fflush(stdout);
			}

			break;
	}

	_exit(signum);
}
#endif

The handler exits with the signal number as the exit status, so that supports my theory.
Looking further in the same file, I see:

void
veriT_signal_init(void)
{
#ifdef CAPTURE_SIGNALS
	stack_INIT(veriT_sigfun_stack);
	memset(&new_act, 0, sizeof(new_act)); /* memset necessaire pour flags=0 */
	new_act.sa_handler = veriT_signal_handle;
	sigemptyset(&new_act.sa_mask);
	sigaddset(&new_act.sa_mask, SIGINT);
	sigaddset(&new_act.sa_mask, SIGQUIT);
	sigaddset(&new_act.sa_mask, SIGILL);
	sigaddset(&new_act.sa_mask, SIGABRT);
	sigaddset(&new_act.sa_mask, SIGFPE);
	sigaddset(&new_act.sa_mask, SIGSEGV);
	sigaddset(&new_act.sa_mask, SIGPIPE);
	sigaddset(&new_act.sa_mask, SIGTERM);
	sigaction(SIGINT, &new_act, &old);
	sigaction(SIGQUIT, &new_act, &old);
	sigaction(SIGILL, &new_act, &old);
	sigaction(SIGABRT, &new_act, &old);
	sigaction(SIGFPE, &new_act, &old);
	sigaction(SIGSEGV, &new_act, &old);
	sigaction(SIGPIPE, &new_act, &old);
	sigaction(SIGTERM, &new_act, &old);
	sigaction(SIGXCPU, &new_act, &old);
	sigaction(SIGALRM, &new_act, &old);
#endif
}

void
veriT_signal_done(void)
{
#ifdef CAPTURE_SIGNALS
	new_act.sa_handler = SIG_DFL;
	sigaction(SIGINT, &new_act, &old);
	sigaction(SIGQUIT, &new_act, &old);
	sigaction(SIGILL, &new_act, &old);
	sigaction(SIGABRT, &new_act, &old);
	sigaction(SIGFPE, &new_act, &old);
	sigaction(SIGSEGV, &new_act, &old);
	sigaction(SIGPIPE, &new_act, &old);
	sigaction(SIGTERM, &new_act, &old);
	sigaction(SIGXCPU, &new_act, &old);
	stack_free(veriT_sigfun_stack);
#endif
}

I wonder why the handler for SIGALRM is not treated the same as the others here?  It doesn't seem
right, especially given that they are freeing something that is used in the handler.  A possible
mechanism for the observed behavior is that veriT is attempting to terminate, it uninstalls the
handlers, except for the SIGALRM handler, and then receives a SIGALRM, causing the handler to be
invoked and _exit() to be called with argument SIGALRM (== 14).

I suppose it might be that the authors wanted to leave the alarm handler in place so that SIGALRM
would still cause veriT to exit even if it is the process of already trying to exit.  But then
there is at least a race here between normal termination and the SIGALRM, and it also possibly
indicates a deadlock or something else that is slowing down termination once all the handlers
have been uninstalled.





This archive was generated by a fusion of Pipermail (Mailman edition) and MHonArc.