Pipeline Interleaved Programmable DSP's: Synchronous data flow programming

Edward A. Lee and David G. Messerschmitt

IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 35, No. 9, pp. 1334-1345, September, 1987.


In the companion paper [1], a programmable architecture for digital signal processing is proposed that requires the partitioning of a signal processing task into multiple programs that execute concurrently. In this paper, a synchronous data flow programming method is proposed for programming this architecture, and programming examples are given.

Because of its close connection with block diagrams, data flow programming is natural and convenient for describing digital signal processing (DSP) systems.Synchronous dataflow is a special case of data flow (large grain or atomic) in which the number of tokens consumed or produced each time a node is invoked is specified for each input or output of each node. A node (or block) is asynchronous if these numbers cannot be specified a priori. A program described as a synchronous data flow graph can be mapped onto parallel processors at compile time (statically), so the run time overhead usually associated with data flow implementations evaporates. Synchronous data flow is therefore an appropriate paradigm for programming high-performance real-time applications on a parallel processor like the processors in the companion paper. The sample rates can all be different, which is not true of most current data-driven digital signal processing programming methodologies. Synchronous data flow is closely related to computation graphs, a special case of Petri nets.

In this paper, we outline the programming methodology by illustrating how nodes are defined, how data passed between nodes are buffered, and how a compiler can map the nodes onto parallel processors. We give an example of a typically complicated unstructured application: a voiceband data modem. For this example, using a natural partition of the program into functional blocks, the scheduler is able to use up to seven parallel processors with 100 percent utilization. Beyond seven processors, the utilization drops because the scheduler is limited by a recursive computation, the equalizer tap update loop. No attempt has been made to modify the algorithms or their description to make them better suited for parallel execution. This example, therefore, illustrates that modest amounts of concurrency can be effectively used without particular effort on the part of the programmer.