Chapter 5. Parallel Execution

Not only statements in a block are executed in parallel, but also function arguments and operands, as long they are pure and do not depend on earlier computations in the block.

A computation is called pure if it does not depend on anything else than its input parameters. (With some care, even computations that obtain information from external input can be declared pure by you if needed.)

That said, scheduling parallel computation comes at a cost - tasks have to be created and scheduled for execution by a local worker thread pool, by a GPU, or even by a cluster of remote machines. In the latter case, input data have to be serialized and sent via the network to the remote node, where the data are deserialized. When the computation is done, its results have to be sent back. Last not least the parallel work has to be synchronized.

Therefore, a Scaly implementation has to justify parallel execution at least by some heuristic reasoning, better by profiling a set of reference computation workloads. Scheduling some single floating point additions which might each take nanoseconds or less for parallel execution surely isn’t worth the overhead. Parsing a multitude of source files in contrast can be expected to speed up compiling a program, and performing heavy number crunching needed for fluid mechanics calculations in parallel would a safe bet.

Adjusting the granularity of parallel execution, however, is beyond the Scaly language specification which only states what computations can potentially be done in parallel, or to be exact, makes no statement about the order in which independent computations are done.