Re: [capsicum] wait() and pdfork()

On Fri, Mar 07, 2014 at 10:24:44AM +0100, Pawel Jakub Dawidek wrote:
> On Thu, Mar 06, 2014 at 09:40:30AM +0000, David Chisnall wrote:
> > On 6 Mar 2014, at 00:10, Eitan Adler <lists at> wrote:

> > > This is confusing and leads to the same issues that the other
> > > wait calls have.  IMHO it would be better to implement pdwait() and
> > > deny waitpid().  This also leads to cleaner documentation: "the wait*
> > > calls do not work on process descriptors".

> > Except that there isn't a clear distinction.  Processes created with
> > pdfork() have a pid that appears in the global PID namespace
> > (visible to non-capsicum processes) and accessible via the
> > pdgetpid() call.  Therefore, the distinction between a process
> > created with pdfork() and one created with fork() is only visible to
> > the process that spawned the child.

> > Preventing wait*() from working on pids for processes that are
> > created with pdfork() is going to cause massive pain for any
> > application (e.g. debuggers, init systems, and so on) that want to
> > wait for processes that are not their immediate children.  

> > The wait*() calls work on a global namespace.  They are calls that
> > are *only* useable by processes that are running with ambient
> > authority.  They should not be restricted based on the way in which
> > the PID was created, because this is not a globally visible property
> > and would be a serious POLA violation.  

> I fully agree with David. A process created with pdfork(2) has a PID,
> so syscalls operating on PIDs (wait*(), kill(2), etc.) should continue
> to work. Also note a special case when the PD_DAEMON flag is used for
> pdfork(2) and the process can exists even if all process descriptors
> are closed.

Hmm. There is a distinction between the wait* calls and most other
syscalls that accept PIDs in that wait calls are only permitted on child
processes (the PID was returned by fork() or a similar call or ptrace()
attached to it successfully, and a wait call has not yet reported
termination and the process has not been detached via ptrace();
alternatively, the caller is PID 1 and has inherited the process).

Therefore, the wait calls only use the global namespace in the sense
that the numerical values of the PIDs say things about other processes
being created on the system. Also, if used properly, the wait calls do
not have races with PID reuse, nor do other syscalls when passed the PID
of a child process.

Some ways the wait calls may be used improperly, causing PID races and
lost statuses, are calling waitpid(-1, ...) while another thread is in
waitpid() for a specific PID, and calling waitpid() on a PID obtained
from SIGCHLD siginfo_t.si_pid while another part of the program also
calls waitpid() on that PID. Indeed, POSIX discourages using waitpid(-1,
...). Therefore, PD_DAEMON should probably reparent the child to init
on last close of the process descriptor, so applications are not forced
to call waitpid(-1, ...) to avoid zombies.

So I think "the wait* calls do not work on process descriptors" would
only be visible to the parent process and not to any other process on
the system.

I expect a possibility of reaping the zombie before the last close of
the process descriptor. This would free the PID and struct proc for
reuse. The status and rusage can be kept in the process descriptor
kernel object. This likely requires a working pdwait call.

By the way, another question with pdwait is how rusage statistics are
handled. The standard wait calls add in the child's rusage statistics
(visible via getrusage(RUSAGE_CHILDREN)) when they report termination
(unless WNOWAIT was passed); if a process does not wait for a child
process, the rusage is added directly to init's RUSAGE_CHILDREN. Process
descriptors and pdwait could work similarly, or the rusage could (also?)
be added on last close of the process descriptor.

Jilles Tjoelker

This archive was generated by a fusion of Pipermail (Mailman edition) and MHonArc.