chymyst-core

Molecules and emitters, in depth

Molecule names

The names of molecules in Chymyst have no effect on any concurrent computations. For instance, the runtime engine will not check that a molecule’s name is not empty, or that the names of different molecules are different. Molecule names are used only for debugging: they are printed when logging reactions and reaction sites.

There are two ways of assigning a name to a molecule:

Here is an example of defining emitters using explicit class constructors and molecule names:

val counter = new M[Int]("counter")
val fetch = new B[Int, Int]("fetch")

This code is completely equivalent to the shorter code written using macros:

val counter = m[Int]
val fetch = b[Int, Int]

These macros read the names "counter" and "fetch" from the surrounding code. This functionality is intended as a syntactic convenience.

Each molecule emitter has a toString method, which returns the molecule’s name. For blocking molecules, the molecule’s name is followed by "/B".

val x = new M[Int]("counter")
val y = new B[Unit, Int]("fetch")

x.toString // returns “counter”
y.toString // returns “fetch/B”

Molecules vs. molecule emitters

Molecules are emitted into the chemical soup using the syntax such as c(123). Here, c is a value we define using a construction such as

val c = m[Int]

Any molecule emitted in the soup must carry a value. For example, the molecule c() must carry an integer value.

So the value c we just defined is not a molecule in the soup. The value c is a molecule emitter, — a function that, when called, will emit molecules of chemical sort c into the soup. More precisely, an emitter call such as c(123) returns Unit and performs a side effect that consists of emitting the molecule of sort c with value 123 into the soup.

The syntax c(x) is used in two different ways:

In both cases, c(x) can be visualized as a molecule with value x that exists in the soup.

The chemistry-resembling syntax such as c(x) + d(y) is also used in two different ways:

Non-blocking molecules

If c defined as above as val c = m[Int], the emitter c is non-blocking. The emitter function call c(123) is non-blocking because it immediately returns a Unit value and does not wait for any reaction involving c(123) to actually start.

The non-blocking emitter c defined above will have type M[Int] and can be also created directly using the class constructor:

val c = new M[Int]("c")

Molecules carrying Unit values can be emitted using the syntax such as a() rather than a(()), which is also valid. The shorter syntax a() is provided as a purely syntactic convenience.

Blocking molecules

For a blocking molecule, the emitter call will block until a reaction can start that consumes that molecule.

A blocking emitter is defined like this,

val f = b[Int, String]

Now f is a blocking emitter that takes an Int value and returns a String.

Blocking emitters are values of type B[T, R], which is a subtype of T ⇒ R. The emitter f could be equivalently defined by

val f = new B[Int, String]("f")

In this case, the user needs to provide the molecule name explicitly.

Once f is defined like this, an emitter call such as

val result = f(123)

will emit a molecule of sort f with value 123 into the soup.

After emitting f(123), the process will become blocked until some reaction starts, consumes this molecule, and performs a reply action.

Since the type of f is B[Int, String], the reply action must pass a String value to the reply function:

go { case c(x) + f(y, r) 
  val replyValue = (x + y).toString 
  r(replyValue) 
}

The reply action consists of calling the reply emitter r with the reply value as its argument.

Only after the reply action, the process that emitted f(123) will become unblocked and the statement val result = f(123) will be completed. The variable result will become equal to the string value that was sent as the reply.

Remarks about the semantics of Chymyst

Chemical designations of molecules vs. molecule names vs. local variable names

Each molecule has a name — a string that is used in error messages and for debugging. The molecule’s name must be specified when creating a new molecule emitter:

val counter = new M[Int]("counter")

Most often, we would like the molecule’s name to be the same as the name of the local variable, such as counter, that holds the emitter. When using the m macro, specifying names in this way becomes automatic:

val counter = m[Int]
counter.name == "counter" // true

Each molecule also has a specific chemical designation, such as sum, counter, and so on. These chemical designations are not actually strings "sum" or "counter". The names of the local variables are chosen purely for the programmer’s convenience.

Rather, the chemical designations are the object identities of the molecule emitters. We could define an alias for a molecule emitter, for example like this:

val counter = m[Int]
val q = counter

This code will copy the molecule emitter counter into another local value q. However, this does not change the chemical designation of the molecule, because q will be a reference to the same JVM object as counter. The emitter q will emit the same molecules as counter; that is, molecules emitted with q(...) will react in the same way and in the same reactions as molecules emitted with counter(...).

(This is similar to how chemical substances are named in ordinary language. For example, NaCl, “salt” and “sodium hydrochloride” are alias names for the same chemical substance.)

The chemical designation of the molecule specifies two aspects of the concurrent program:

When a new reaction site is defined, certain molecules become bound to that site. (These molecules are the inputs of reactions defined at the new site.) If a reaction site is defined within a local scope of a function, new molecule emitters and a new reaction site will be created every time the function is called. Since molecules internally hold a reference to their reaction site, each function call will create chemically unique new molecules, in the sense that these molecules will react only with other molecules defined in the same function call.

As an example, consider a function that defines a simple reaction like this:

def makeLabeledReaction() = {
  val begin = m[Unit]
  val label = m[String]
  
  site(
    go { case begin(_) + label(labelString)  println(labelString) }
  )
  
  (begin, label)
}

// Let's use this function now.

val (begin1, label1) = makeLabeledReaction() // first call
val (begin2, label2) = makeLabeledReaction() // second call

What is the difference between the new molecule emitters begin1 and begin2? Both begin1 and begin2 have the name "begin", and they are of the same type M[Unit]. However, they are chemically different: begin1 will react only with label1, and begin2 only with label2. The molecules begin1 and label1 are bound to the reaction site created by the first call to makeLabeledReaction(), and this reaction site is different from that created by the second call to makeLabeledReaction().

For example, suppose we emit label1("abc") and begin2(). We have emitted a molecule named "label" and a molecule named "begin". However, these molecules will not start any reactions, because they are bound to different reaction sites.

label1("abc") + begin2() // no reaction started!

After running this code, the molecule label1("abc") will be waiting at the first reaction site for its reaction partner, begin1(), while the molecule begin2() will be waiting at the second reaction site for its reaction partner, label2(...). If we now emit, say, label2("abc"), the reaction with the molecule begin2() will start and print "abc".

label1("abc") + begin2() // no reaction started!
label2("abc") // reaction with begin2() starts and prints "abc"

We see that begin1 and begin2 have different chemical designations because they enter different reactions. This is so despite the fact that both begin1 and begin2 are defined by the same code in the function makeLabeledReaction(). Since the code creates a new emitter value every time, the JVM object identities of begin1 and begin2 are different.

The chemical designations are independent of molecule names and of the variable names used in the code. For instance, we could (for whatever reason) create aliases for label2 and begin2 and write code like this:

val (begin1, label1) = makeLabeledReaction() // first call
val (begin2, label2) = makeLabeledReaction() // second call
val (x, y, p, q) = (begin1, label1, begin2, label2) // make aliases

y("abc") + p() // Same as label1("abc") + begin2() — no reaction started!
q("abc") // Same as label2("abc") — reaction starts and prints "abc"

In this example, the values x and begin1 refer to the same molecule emitter, thus they have the same chemical designation. For this reason, the calls to x() and begin1() will emit copies of the same molecule. As we already discussed, the molecule emitted by x() will have a different chemical designation from that emitted by begin2().

In practice, it is of course advisable to choose meaningful local variable names.

To summarize:

The type matrix of molecule emission

Let us consider what could theoretically happen when we call an emitter.

The emitter call can be either blocking or non-blocking, and it could return a value or return no value. Let us write down all possible combinations of emitter call types as a “type matrix”.

To use a concrete example, we assume that c is a non-blocking emitter of type M[Int] and f is a blocking emitter of type B[Unit, Int].

  blocking emitter non-blocking emitter
value is returned val x: Int = f() ?
no value returned ? c(123) // side effect

So far, we have seen that blocking emitters return a value, while non-blocking emitters don’t. There are two more combinations that are not yet used:

The Chymyst library implements both of these possibilities as special features:

With these additional features, the type matrix of emission is complete:

  blocking emitter non-blocking emitter
value is returned: val x: Int = f() val x: Int = c.volatileValue
no value returned: (timeout was reached) c(123) // side effect

We will now describe these features in more detail.

Timeouts for blocking emitters

By default, a blocking emitter will emit a new molecule and block until a reply action is performed for that molecule by some reaction. If no reaction can be started that consumes the blocking molecule, its emitter will block and wait indefinitely.

Chymyst allows us to limit the waiting time to a fixed timeout value. Timeouts are implemented by the method timeout()() on the blocking emitter:

val f = b[Unit, Int]
// create a reaction site involving `f` and other molecules:
site(...)

// call the emitter `f` with 200ms timeout:
val x: Option[Int] = f.timeout()(200 millis)

The first argument of the timeout()() method is the value carried by the emitted molecule. In this example, this value is empty since f has type B[Unit, Int]. The second argument of the timeout()() method is the duration of the delay.

The timeout()() method returns a value of type Option[R], where R is the type of the blocking molecule’s reply value. In the code above, if the emitter received a reply value v before the timeout expired then the value of x will become Some(v).

If the emitter times out before a reply action is performed, the value of x will be None and the blocking molecule f() will be removed from the soup. If the timeout occurred because no reaction started with f(), which is the usual reason, the removal of f() makes sense because no further reactions should try to consume f() and reply.

A less frequent situation is when a reaction already started, consuming f() and is about to reply to f(), but the waiting process just happened to time out at that very moment. In that case, sending a reply to f() will have no effect.

Is the timeout feature required? The timeout functionality can be simulated, in principle, using the “First Reply” construction. However, this construction is cumbersome and will sometimes leave a thread blocked forever, which is undesirable from the implementation point of view. For this reason, Chymyst implements the timeout functionality as a special primitive feature available for blocking molecules.

In some cases, the reaction that sends a reply to a blocking molecule needs to know whether the waiting process has already timed out. For this purpose, Chymyst defines the reply emitters as functions R ⇒ Boolean. The return value is true if the waiting process has received the reply, and false otherwise.

Static molecules

It is often useful to ensure that exactly one copy of a certain molecule is initially present in the soup, and that no further copies can be emitted unless one is first consumed. Such molecules are called static. A static molecule s must have reactions only of the form s + a + b + ... → s + c + d + ..., — that is, reactions that consume a single copy of s and then also emit a single copy of s.

An example of a static molecule is the “asynchronous counter” molecule c() with reactions that we have seen before:

go { case c(x) + d(_)  c(x - 1) }
go { case c(x) + i(_)  c(x + 1) }
go { case c(x) + f(_, r)  c(x) + r(x) }

These reactions treat c() as a static molecule because they first consume and then emit a single copy of c().

Static molecules are a frequently used pattern in chemical machine programming. Chymyst provides special features for static molecules:

In order to declare a molecule as static, the users of Chymyst must write a reaction that has no input molecules but emits some output molecules. Such reactions are automatically recognized by Chymyst as static reactions:

site (
// This static reaction declares a, c, and q to be static molecules and emits them.
    go { case _  a(1) + c(123) + q() },
// Now define some more reactions that consume a, c, and q.
    go { case a(x) + ...  ???; a(y) } // etc.
)

The reaction go { case _ ⇒ a(1) + c(123) + q() } emits three output molecules a(), c(), and q() but has a wildcard instead of input molecules. Chymyst detects this and marks the reaction as static. The output molecules are also declared as static.

A reaction site can have several static reactions. Each non-blocking output molecule of each static reaction is automatically declared a static molecule.

The reaction sites will run their static reactions only once, at the time of the site(...) call itself, and on the same thread that calls site(...). At that time, the reaction site still has no molecules present. Static molecules will be the first ones emitted into that reaction site.

Constraints on using static molecules

Declaring a molecule as static can be a useful tool for avoiding errors in chemical machine programs because the usage of static molecules is tightly constrained:

These restrictions are intended to maintain the semantics of static molecules. Application code that violates these restrictions will cause an “early” run-time error — that is, an exception thrown by the site() call before any reactions can run at that reaction site.

Volatile readers for static molecules

When a static molecule exists only as a single copy, its value works effectively as a mutable cell: the value can be modified by reactions, but the cell cannot be destroyed. So it appears useful to have a read-only access to the value in the cell.

This is implemented as a volatile reader — a function of type ⇒ T that fetches the value carried by that static molecule when it was most recently emitted. Here is how volatile readers are used:

val c = m[Int]
site(
  go { case c(x) + incr(_)  c(x + 1) },
  go { case _  c(0) } // emit `c(0)` and declare it as static
)

val readC: Int = c.volatileValue // initially returns 0

The volatile reader is thread-safe (can be used from any reaction without blocking any threads) because it provides a read-only access to the value carried by the molecule.

The value of a static molecule, viewed as a mutable cell, can be modified only by a reaction that consumes the molecule and then emits it back with a different value. If the volatile reader is called while that reaction is being run, the reader will return the previous known value of the static molecule, which is probably going to become obsolete very shortly. We call the volatile reader “volatile” for this reason: it returns a value that could change at any time.

The functionality of a volatile reader is similar to an additional reaction with a blocking molecule f that reads the current value carried by c:

go { case c(x) + f(_, reply)  c(x) + reply(x) }

Calling f() returns the current value carried by c(), just like the volatile reader does. However, the call f() may block for an unknown time if c() has been consumed by a long-running reaction, or if the reaction site happens to be busy with other computations. Even if c() is immediately available in the soup, running a new reaction requires an extra scheduling operation. Volatile readers provide fast read-only access to values carried by static molecules.

This feature is restricted to static molecules because it appears to be useless if we read the last emitted value of a molecule that has many different copies emitted in the soup.

Exercise: concurrent divide-and-conquer with cancellation

Implement a (blocking) function that finds the smallest element in an integer array, using a concurrent divide-and-conquer method. If the smallest element turns out to be negative, the computation should be aborted as early as possible, and 0 should be returned as the final result.

Use a static molecule with a volatile reader to signal that the computation needs to be aborted early.