chymyst-core

Version history

Roadmap for the future

These features are considered for implementation in the next versions:

Version 0.1: (Released.) Perform static analysis of reactions, and warn the user about certain situations with unavoidable livelock, deadlock, or indeterminism.

Version 0.2: (Released.) Rewrite the reaction scheduler, optimizing for performance and flexibility.

In particular, do not lock the entire molecule bag - only lock some groups of molecules that have contention on certain molecule inputs (decide this using static analysis information). (Will not do.)

Version 0.3:

Version 0.4: Enterprise readiness.

Version 0.5: Application framework Chymyst

Version 0.6: Automatic distributed and fault-tolerant execution of chemical reactions (“soup pools” or another mechanism).

Version 0.7: Static optimizations: use advanced macros and code transformations to completely eliminate all blocking and all inessential pattern-matching overhead.

Version 0.8: Distributed execution and cluster deployments.

Version 1.0: Complete enterprise-ready features, adapters to other frameworks, and several real-life projects using Chymyst Core. Complete documentation as a separate developer documentation and a book “Concurrency in Reactions: Declarative Scala multiprocessing with Chymyst”.

Current To-Do List

value * difficulty - description

1 * 1 - add chymyst-examples repo.

2 * 2 - add a metadata record to molecules. Documentation string for the molecule’s value, other metadata.

1 * 1 - blocking molecules cannot have reactions with only one input (?) - not sure if this is helpful.

4 * 5 - do not schedule reactions if queues are full. At the moment, RejectedExecutionException is thrown. It’s best to avoid this. Molecules should be accumulated in the bag, to be inspected at a later time (e.g. when some tasks are finished). Insert a call at the end of each reaction, to re-inspect the bag.

3 * 4 - Blocking molecule emitter’s Future[] API should report errors (failure to give a reply value), while a non-Future API doesn’t do this. When exception is thrown in the reaction, the Promise will fail. Emitting thread could inspect the promise and detect the failure. In this way, we can still have some error reporting from exceptions (which was removed in 0.2.0). However, a better mechanism could be implemented.

5 * 5 - Implement full error recovery: attach an error handler to a reaction. Error handler is another reaction that has an automatic Throwable input and otherwise has the same input molecules as the errored reaction. go { case a(x) + b(y) => ... } recoverWith { (e: Throwable) => go { case a(x) + b(y) => ... } } Recovery semantics:

1 * 1 - Implement Reaction.withRetry(retry: Boolean) as another alias to the existing API. Also, Reaction.enable(enable: Boolean) ?

5 * 5 - Implement performance metrics either through a given logger or through special molecules. Need to monitor: Bag size at reaction site; arrival rate; consumption rate; reaction error rate; reaction compute time; thread utilization (busy / locked waiting / idle) both for scheduler thread and for worker threads.

4 * 4 - Move more code into the go{} macro, so that the Reaction() constructor has less work to do. 5 * 5 - Implement caching of Reaction and ReactionSite values by md5 hash, so that we can reuse some data structures if possible instead of recomputing them. (Note that the reactions close over molecule emitters, and reaction sites close over reactions, but many data structures use only molecule indices, and so could be shared rather than recomputed.)

2 * 2 - Detect this condition at the reaction site time: A cycle of input molecules being subset of output molecules, possibly spanning several reaction sites (a->b+…, b->c+…, c-> a+…). This is a warning if there are nontrivial matchers and an error otherwise. - This depends on better detection of output environments.

3 * 3 - define a special “switch off” or “quiescence” molecule - per-join, with a callback parameter. Also define a “shut down” molecule which will enforce quiescence and then shut down the site pool and the reaction pool.

3 * 3 - add logging of reactions currently in progress at a given RS. (Need a custom thread class, or a registry of reactions?)

3 * 4 - use java.management to get statistics over how much time is spent running reactions, waiting while BlockingIdle(), etc. This should be a pool-based API and use an external logger; or it can use special molecules.

5 * 5 - reaction sites should detect the situation when another reaction site is pumping molecules into this RS while these molecules can’t be consumed quickly enough. It should identify which reactions are emitting these molecules, and notify the other RS about it (“backpressure”).

2 * 2 - thread pools should have an API for changing the number of threads at run time.

2 * 2 - interop with Akka actors, in a separate project with its own artifact and dependency. Similarly for interop with Akka Stream, Scalaz Task etc.

3 * 4 - implement “thread fusion” like in iOS/Android: 1) when a blocking molecule is emitted from a thread T and the corresponding reaction site runs on the same thread T, do not schedule a task but simply run the reaction site synchronously (non-blocking molecules still require a scheduled task? not sure); 2) when a reaction is scheduled from a reaction site that runs on thread T and the reaction is configured to run on the same thread T, do not schedule a task but simply run the reaction synchronously.

3 * 5 - implement automatic thread fusion for static molecules? — not sure how that would work. Can we make at least some reactions, if scheduled very quickly, to be scheduled on the same thread?

4 * 3 - as an option, run a reaction site on the current thread (?) or on a given executor

2 * 3 - when attaching molecules to futures or futures to molecules, we can perhaps schedule the new futures on the same thread pool as the reaction site to which the molecule is bound? This requires having access to that thread pool. Maybe that access would be handy to users anyway?

5 * 5 - is it possible to implement distributed execution by sharing the site pool with another machine (but running the reaction sites only on the master node)? Use Paxos, Raft, or other consensus algorithm to ensure consistency? Using master-worker architecture, or fully symmetric p2p architecture?

5 * 5 - Distributed vs. Remote; molecule vs. reaction vs. reaction site. This yields 6 distinct possibilities for distributed / remote execution. Need to figure out their logical dependencies and implementation possibilities. Note that, compared with the Actor model, we do not need to check that the actor is alive; distributed execution model only needs to verify that (1) network is up, (2) remote application is running.

5 * 5 - implement “progress and safety” assertions so that we could prevent deadlock in more cases and be able to better reason about our declarative reactions. First, need to understand what is to be asserted. Can we assert non-contention on certain molecules? Can we assert deterministic choice of some reactions? Should we assert the number of certain molecules present (precisely N, or at most N)?

2 * 4 - allow molecule values to be parameterized types or even higher-kinded types? Need to test this.

2 * 2 - make memory profiling / benchmarking; how many molecules can we have per 1 GB of RAM? Are reactions or reaction sites ever garbage-collected?

2 * 2 - add tests for Pool such that we submit a closure that sleeps and then submit another closure. Should get / or not get the RejectedExecutionException

3 * 5 - consider whether we would like to prohibit emitting molecules from non-reaction code. Maybe with a construct such as withMolecule{ ... } where the special molecule will be emitted by the system? Can we rewrite tests so that everything happens only inside reactions?

3 * 3 - implement “one-off” or “perishable” molecules that are emitted once (like static, from the reaction site itself) and may be emitted only if first consumed (but not necessarily emitted at start of the reaction site), and not repeatedly emitted

5 * 5 - How to rewrite reaction sites so that blocking molecules are transparently replaced by a pair of non-blocking molecules? Can this be done even if blocking emitters are used inside functions? (Perhaps with extra molecules emitted at the end of the function call?) Is it useful to emit an auxiliary molecule at the end of an “if” expression, to avoid code duplication? How can we continue to support real blocking emitters when used outside of macro code?

2 * 2 - Revisit Philippe’s error reporting branch, perhaps salvage some code

3 * 3 - Replace some timed tests by probabilistic tests that run multiple times and fail much less often; perhaps use Li Haoyi’s utest framework that has features for this.

3 * 3 - ChymystThread should keep information about which RS and which reaction is now running. This can be used both for monitoring and for automatic assignment of thread pools for reactions defined in the scope of another reaction.

3 * 3 - Write a tutorial section about timers and time-outs: cancellable recurring jobs, cancellable subscriptions, time-outs on receiving replies from non-blocking molecules (started on it already)

3 * 3 - Nested reactions could be automatically defined at the same reaction site/thread pool as parent reactions? This seems to be required for the automatic unblocking transformation.

2 * 3 - Error handling should be flexible enough to implement retry at most N times with backoff and other error recovery logic.

3 * 2 - Should we be able to enable and disable reactions at run time? Should we be able to deactivate entire reaction sites?

2 * 2 - Unit tests need examples; how would I diagnose a deadlock due to off-by-one error in the counter code? How would I use scalacheck to unit-test a reaction? (Emit input molecules with generated values, require output molecule with a value that satisfies a law?)

3 * 5 - implement ING Baker in Chymyst

3 * 3 - figure out whether the dual bipartite graph (e.g. “co-counter”) has any usefulness

3 * 5 - implement Benson Ma’s example of graph-driven reaction construction. What is a good DSL for describing a DAG or a general graph? Should we use graphs instead of inline chemistry to program Chymyst?

5 * 5 - describe chemistry with DOT graph visualizations https://en.wikipedia.org/wiki/DOT_(graph_description_language)

3 * 4 - formulate the “Async monad” and automatically convert “async” monadic code to Chymyst, with automatic parallelization of applicative subgraphs

Will not do for now

3 * 2 - add per-molecule logging; log to file or to logger function (do we need this, if we already have event reporting and test hooks?)

3 * 4 - LAZY values on molecules? By default? What about pattern-matching then? Probably need to refactor SyncMol and AsyncMol into non-case classes and change some other logic. — Will not do now. Not sure that lazy values on molecules are important as a primitive. We can always simulate them using closures.

2 * 3 - investigate using a single wait/notify pair instead of 2 semaphores; does it give better performance? - Implemented in 0.2.0

5 * 5 - implement fairness with respect to molecules. - Will not do now. If reactions depend on fairness, something is probably wrong with the chemistry. Instead, pipelining should be a very often occurring optimization.

3 * 5 - create and use an RDLL (random doubly linked list) data structure for storing molecule values; benchmark. Or use Vector with tail-swapping? This should help fetch random molecules out of the soup. - Will not do now. Not sure what value it brings us if molecule values are truly randomly chosen.

4 * 5 - implement multiple emission construction a+b+c so that a+b-> and b+c-> reactions are equally likely to start. - Will not do now. Not sure what this accomplishes. The user can randomize the order of emission, if this is crucial for an application.

2 * 2 - Support timers: recurrent or one-off, cancelable molecule emission. — Will not do now. This can be done by user code, unless timers require an explicit thread pool or executor.

3 * 3 - Can we use macros to rewrite f() into f(_) inside reactions for Unit types? Otherwise it seems impossible to implement short syntax case a() + b() => in the input patterns. — No, we can’t because { case a() => } doesn’t get past the Scala typer, and so macros don’t see it at all.

3 * 5 - Can we implement Chymyst Core using Future / Promise and remove all blocking and all semaphores? — No. Automatic concurrent execution of reactions when multiple molecules are available cannot be implemented using promises / futures.

4 * 5 - allow several reactions to be scheduled truly simultaneously out of the same reaction site, when this is possible. Avoid locking the entire bag? - perhaps partition it and lock only some partitions, based on reaction site information gleaned using a macro. — This was attempted, but yields a very complex algorithm and does not give a significant performance boost.

3 * 3 - perhaps prohibit using explicit thread pools? It’s error-prone because the user can forget to stop a pool. Perhaps only expose an API such as withFixedPool(4){ implicit tp => ...}? Investigate using implicit values for pools. — doesn’t seem to be useful. Maybe remove default pools altogether? It seems that every pool needs to be stopped. — However, this would prevent sharing thread pools across scopes. Maybe that is not particularly useful? - Also, with the 0.2.0 changes, pools do not need to be stopped explicitly.

5 * 5 - Can we perform the unblocking transformation using delimited continuations? — Not sure. Delimited continuations seem to require all code to reside inside reset().