Top Up Prev Next Bottom Contents Index Search

13.3 Schedulers

Given a Universe of functional blocks to be scheduled and a Target describing the topology and characteristics of the single- or multiple-processor system for which code is to be generated, it is the responsibility of the Scheduler object to perform some or all of the following functions:

If the program graph follows SDF semantics, all of the above steps are done statically (i.e. at compile time). A dataflow graph with dynamic constructs uses the minimal runtime decision making to determine the execution order of actors.

13.3.1 Single-Processor Schedulers

For targets consisting of a single processor, we provide three different scheduling techniques. The user can select the most appropriate scheduler for a given application by setting the loopingLevel target parameter.

In the first approach (loopingLevel = DEF), which is the default SDF scheduler, we conceptually construct the acyclic precedence graph (APG) corresponding to the system, and generate a schedule that is consistent with that precedence graph. Note that the precedence graph is not physically constructed. There are many possible schedules for all but the most trivial graphs; the schedule chosen takes resource costs, such as the necessity of flushing registers and the amount of buffering required, into account. The target then generates code by executing the actors in the sequence defined by this schedule. This is a quick and efficient approach when the SDF graph does not have large sample-rate changes. If there are large sample-rate changes, the size of the generated code can be huge because the codeblock for an actor might occur many times (if the number of repetitions for the actor is greater than one); in this case, it is better to use some form of loop scheduling.

The second approach we call Joe's scheduling. In this approach (loopingLevel = CLUST), actors that have the same sample rate are merged (wherever this will not cause deadlock) and loops are introduced to match the sample rates. The result is a hierarchical clustering; within each cluster, the techniques described above can be used to generate a schedule. The code then contains nested loop constructs together with sequences of code from the actors.

Since the second approach is a heuristic solution, there are cases where some looping possibilities go undetected. By setting the loopingLevel to SJS, we can choose the third approach, called SJS (Shuvra-Joe-Soonhoi) scheduling after the inventor's first names. After performing Joe's scheduling at the front end, it attacks the remaining graph with an algorithm that is guaranteed to find the maximum amount of looping available in the graph. That is, it generates a single appearance schedule whenever one exists.

A fourth approach, obtained by setting loopingLevel to ACYLOOP, we choose a scheduler that generates single appearance schedules optimized for buffer memory usage. This scheduler was developed by Praveen Murthy and Shuvra `Bhattacharyya [Mur96] [Bha96]. This scheduler only tackles acyclic SDF graphs, and if it finds that the universe is not acyclic, it automatically resets the loopingLevel target parameter to SJS. Basically, for a given SDF graph, there could be many different single appearance schedules. These are all optimally compact in terms of schedule length (or program memory in inline code generation). However, they will, in general, require differing amounts of buffering memory; the difference in the buffer memory requirement of an arbitrary single appearance schedule versus a single appearance schedule optimized for buffer memory usage can be dramatic. In code generation, it is essential that the memory consumption be minimal, especially when generating code for embedded DSP processors since these chips have very limited amounts of on-chip memory. Note that acyclic SDF graphs always have single appearance schedules; hence, this scheduler will always give single appearance schedules. If the file target parameter is set, then a summary of internal scheduling steps will be written to that file. Essentially, two different heuristics are used by the ACYLOOP scheduler, called APGAN and RPMC, and the better one of the two is selected. The generated file will contain the schedule generated by each algorithm, the resulting buffer memory requirement, and a lower bound on the buffer memory requirement (called BMLB) over all possible single appearance schedules.

If the second, third, or fourth approaches are taken, the code size is drastically reduced when there are large sample rate changes in the application. On the other hand, we sacrifice some efficient buffer management schemes. For example, suppose that star A produces 5 samples to star B which consumes 1 sample at a time. If we take the first approach, we schedule this graph as ABBBBB and assign a buffer of size 5 between star A and B. Since each invocation of star B knows the exact location in the allocated buffer from which to read its sample, each B invocation can read the sample directly from the buffer. If we choose the second or third approach, the scheduling result will be A5(B). Since the body of star B is included inside a loop of factor 5, we have to use indirect addressing for star B to read a sample from the buffer. Therefore, we need an additional buffer pointer for star B (memory overhead), and one more level of memory access (run-time overhead) for indirect addressing.

13.3.2 Multiple-Processor Schedulers

The first step in multiprocessor scheduling, or parallel scheduling, is to translate a given SDF graph to an acyclic precedence expanded graph (APEG). The APEG describes the dependency between invocations of blocks in the SDF graph during execution of one iteration. Refer to the SDF domain documentation for the meaning of one iteration. Hence, a block in a multirate SDF graph may correspond to several APEG nodes. Parallel schedulers schedule the APEG nodes onto processors. Unfortunately, the APEG may have a substantially greater (at times exponential) number of nodes compared to the original SDF graph. For this a hierarchical scheduler is being developed that only partially expands the APEG [Pin95].

We have implemented three basic scheduling techniques that map SDF graphs onto multiple-processors with various interconnection topologies: Hu's level-based list scheduling, Sih's dynamic level scheduling [Sih93a], and Sih's declustering scheduling [Sih93b]. The target architecture is described by its Target object. The Target class provides the scheduler with the necessary information on the number of processors, interprocessor communication etc., to enable both scheduling and code synthesis.

The hierarchical scheduler can use any one of the three basic parallel schedulers as the top-level scheduler. The current implementation supports user-specified clustering at galaxy boundaries. These galaxies are assumed to compose into valid SDF stars in which the SDF parameters are derived from the internal schedule of the galaxy. During APEG expansion, these compositions are opaque; thus, the entire galaxy is treated as a single SDF star. Using hierarchical scheduling techniques, we have realized multiple orders of magnitude speedup in scheduling time and multiple orders of magnitude reduction of memory usage. See [Pin95] for more details.

The previous scheduling algorithms could schedule SDF graphs, the CGDDF scheduler can also handle graphs with dynamic constructs. See section 13.5 for more details.

Whichever scheduler is used, we schedule communication nodes in the generated code. For example, if we use Hu's level-based list scheduler, we ignore communication overhead when assigning stars to processors. Hence, the generated code is likely to contain more communication code than with the other schedulers that do not ignore the IPC overhead.

There are other target parameters that direct the scheduling procedure. If the parameter manualAssignment is set to YES, then the default parallel scheduler does not perform star assignment. Instead, it checks the processor assignment of all stars (set using the procId state of CG and derived stars). By default, the procId state is set to -1, which is an illegal assignment since the child target is numbered from 0. If there is any star, except the Fork star, that has an illegal procId state, an error is generated saying that manual scheduling has failed. Otherwise, we invoke a list scheduler that determines the order of execution of blocks on each processor based on the manual assignment. We do not support the case where a block might require more than one processor. The manualAssignment target parameter automatically sets the oneStarOneProc state to YES; this is discussed next.

If there are sample rate changes, a star in the program graph may be invoked multiple times in each iteration. These invocations may be assigned to multiple processors by default. We can prevent this by setting the oneStarOneProc state to YES. Then, all invocations of a star are assigned to the same processor, regardless of whether they are parallelizable or not. The advantage of doing this is the simplicity in code generation since we do not need to splice in Spread/Collect stars, which will be discussed later. Also, it provides us another possible scheduling option, adjustSchedule; this is described below. The main disadvantage of setting oneStarOneProc to YES is the performance loss of not exploiting parallelism. It is most severe if Sih's declustering algorithm is used. Therefore, Sih's declustering algorithm is not recommended with this option.

In this paragraph, we describe a future scheduling option that this release does not support yet. Once automatic scheduling (with oneStarOneProc option set) is performed, the processor assignment of each star is determined. After examining the assignment, the user may want to override the scheduling decision manually. It can be done by setting the adjustSchedule parameter. If that parameter is set, after the automatic scheduling is performed, the procId state of each star is automatically updated with the assigned processor. The programmer can override the scheduling decision by changing the value of the procId state. The adjustSchedule parameter cannot be YES before any scheduling decision has been made previously. Again, this option is not supported in this release.

Regardless of which scheduling options are chosen, the final stage of the scheduling is to decide the execution order of stars including send/receive stars. This is done by a simple list scheduling algorithm in each child target. The final scheduling results are displayed on a Gantt chart.

The Gantt Chart Display

Demos that use targets derived from CGMultiTarget can produce an interactive Gantt chart display for viewing the parallel schedule.

The Gantt chart display involves a single window for displaying the Gantt chart, which provides scroll bars and zoom buttons for controlling how much of the Gantt chart is shown in the display canvas.

The display canvas represents each star schedule as a box drawn through the time interval over which it is scheduled. If the name of a star can fit in its box, it is printed inside. A vertical bar inside the canvas identifies stars which cannot be labeled. The names of the stars which this bar passes through are printed alongside their respective processor numbers. The bar can be moved horizontally by pressing the left mouse button while on the star to be identified. The stars which the bar passes through are identified by having their icons highlighted in the vem window.

Here is a summary of commands that can be used while the Gantt chart display is active:

To change the area of the Gantt chart inside the display canvas:

Use the scroll bars to move along the Gantt chart in the direction desired.

Click on the zoom buttons to increase or decrease the size of the Gantt chart.

To move the vertical bar to the mouse inside the display window:

Depress and drag the left mouse button inside the display window. The left and right cursor keys move the bar by one time interval; shift-left and shift-right move the bar by ten time intervals.

To exit the Gantt chart display program:

Type control-D inside the display window or click on the dismiss button.

The Gantt chart can also be run as a standalone program to display a schedule previously saved by the Gantt chart:

gantt schedule_filename
A number of limitations exists in the Gantt chart display widget. There are a fixed (hard-coded) number of colors available for coloring processors and highlighting icons. The print function does not work because the font chosen by the font manager is not guaranteed to be Postscript convertible. The save function saves the schedule in the Ptolemy 0.5 format which is different from the Ptolemy 0.6 format generated by the various domains.

Top Up Prev Next Bottom Contents Index Search

Copyright © 1990-1997, University of California. All rights reserved.