[isabelle] A few thoughts about Java multithreading model in relation to type-safety



Type-safety of variables in Java is related to type constraints, but one can
always emulate variables (say, as strings), which aren't type-safe.

Variables in Java are also mutexes. Say that mutexes are safe by
composition, then one can emulate mutexes, which are not - also, one could
create livelocks.

I am not sure if livelocks can be avoided by strong types, but deadlocks can
- and deadlocks are the main problem.

In Java, taking lock is achieved by synchronized method:
synchronized (a) {
  ...
}

One can also take class lock by having it in function signature.

First problem - it's impossible to atomically take two locks like this:
synchronized (a, b) {
  ...
}

These run to deadlock if they are in different threads:
synchronized (a) { synchronized (b) { ... }}
synchronized (b) { synchronized (a) { ... }}

This is easy to see, why - order differs.

Say that we have "then synchronized" keyword.
synchronized (a) {
  then synchronized (b) {
    ...
  }
  ...
}

This should take two locks atomically, then release one of them.

Taking many locks at once, then releasing them in arbitrary order - this
wont run to deadlocks.

Then, imagine this:
synchronized (a) { synchronized (b) { ... }}
synchronized (a) { ... } then synchronized (b) { ... }

First one takes lock b after a, second one will take b atomically when
releasing a. As long as locks are taken in same order, it's impossible to
force system to deadlock. There is a difference between forcing it to
deadlock and allowing deadlock - it has different constraints to
thread-safety of types.

Say that some type is ordered in regard to locks:
class A {
  seq Integer[] b;
}
class B {}
seq { A, B }

In other file, there might be:
seq { A, C }
merging to previous sequence as { A, B, C }

Sequencing locks is one rather powerful way to avoid deadlocks. Also,
integers inside A should be a sub-sequence of A.

For lock of A, we have a few considerable possibilities:
* It's last lock of it's inner sequence, then it's possible to lock it after
any lock there.
* It's first lock of it's inner sequence.
* It's a covering lock of whole it's sequence, locking all Integers in b.

We could have this goal:
* Lock whole A, get some data
* Work on it
* Lock A again, write it

We can mark this data as being worked on or we can try to write and retry if
it's changed meanwhile. We could also have queues, which is my favorite, but
this is not interesting here.

So having A at end of sequence, it is no problem to take lock of A and b[n]
at once, then release A, then work on b[n] and take A again:
synchronized (b[n]) {
  then synchronized (A) {
    // Read data.
  }
  // Work on data.
} then synchronized (A) {
  // Write data.
}

This relies on fact that locking all at once we can not run to deadlock -
then, we release A and one lock left, go sequentially.

Method, then, should look something like that:
int m(n) locks [(A, b[n]), A] {
  synchronized (b[n]) {
    then synchronized (A) {
      // Read data.
    }
    // Work on data.
  } then synchronized (A) {
    // Write data.
  }
}

Eclipse could generate the lock signature. It contains locking path - and
challence is to make it short.

For having more complex use cases - where we want to read several items,
when changing one - it's needed to have independent handler for items in b -
this handler will take internal locks. From such handler, no calls can be
made to parent (this is, by the way, most hard-to-debug deadlock case) from
locked functions. In such case, the functions are kind of "atomic" - you
call them, they have inherently good lock structure, they exit and this
makes no difference what locks you use from your own scope.

This locking, to be usable, must be tightly integrated to the rest of type
system - otherwise it's unimaginable that you create all those connections
between variables (as variable structure itself is dynamic, you can always
add element to an array or reference something from some other class).

Summary:

Sequence structure, defined in several files, is directed and can fork/join.

Each locking sequence can start by taking many locks at once, then release
all locks except connected slice of lock sequence, then progress.

Independent sequences of subsystems work ok as long as one can make calls to
another, which locks things with allowed structure and then releases it's
own internal locks and returns. That other can not make back-calls to
methods, which use caller-scope locks in relation to it.

I am sure that this structure can be worked on to allow more cases, but this
is a simple and proven methodology, which allows a lot. I suggest also
looking at lock structure in Google Go-lang, which basically has only
channels and no locks (and is fast compiled language - someday with a
garbage collector). The Go-lang suggestion is precisely because proving at
least one structure, which allows sending messages in two directions, to be
deadlock-safe, would cover all common needs. This is - everyone tries to
send something, reader can at any moment read the array or next message in
and process it. Sending messages back from many to one could be simple
method call, then. This would allow to emulate something similar to Go's
message-sending structure, which allows all important use cases (as they
say).

Tambet




This archive was generated by a fusion of Pipermail (Mailman edition) and MHonArc.