17.2 Multiprocessor Targets

There are two base multiprocessor targets: MultiTarget and CGMultiTarget. Class MultiTarget, derived from class CGTarget, serves a base multiprocessor target for CG domain. On the other hand, CGMultiTarget class is the base multiprocessor target for CG domain, thus derived from MultiTarget class. Since the MultiTarget class is a pure virtual class, the derived classes should redefine the pure virtual methods of the class.

Some members only meaningful for CG domain are split to MultiTarget class and the CGMultiTarget class. If they are accessed from the parallel scheduler, some members are placed in MultiTarget class. Otherwise, they are placed in CGMultiTarget class (Note that this is the organization issue). Refer to the CGMultiTarget class for detailed descriptions.

17.2.1 Class MultiTarget

Class MultiTarget, derived from CGTarget, has a constructor with three arguments.

MultiTarget(const char* name, const char* starclass, const char* desc);

The arguments are the name of the target, the star class it supports, and the description text. The constructor hides loopingLevel parameter inherited from the CGTarget class since the parallel scheduler does no looping as of now.

IntState nprocs;

This protected variable (or state) represents the number of processors. We can set this state, and also change the initial value, via the following public method:

void setTargets(int num);

After child targets are created, the number of child targets is stored in the following protected member:

int nChildrenAlloc;

There are three states, which are all protected, to choose a scheduling option.

IntState manualAssignment; 
IntState oneStarOneProc; 
IntState adjustSchedule;

If the first state is set to YES, we assign stars manually by setting procId state of all stars. If oneStarOneProc is set to YES, the parallel scheduler puts all invocations of a star into the same processor. Note that if manual scheduling is chosen, oneStarOneProc is automatically set YES. The last state, adjustSchedule, will be used to override the scheduling result manually. This feature has not been implemented yet. There are some public methods related to these states:

int assignManually(); 
int getOSOPreq(); 
int overrideSchedule(); 
void setOSOPreq(int i);

The first three methods query the current value of the states. The last method sets the current value of the oneStarOneProc state to the argument value.

There are two other states that are protected:

IntState sendTime; 
IntState inheritProcessors;

The first state indicates the communication cost to send a unit sample between nearest neighbor processors. If inheritProcessors is set to YES, we inherit the child targets from somewhere else by the following method.

int inheritChildTargets(Target* mtarget);

This is a public method to inherit child targets from the argument target. If the number of processors is greater than the number of child targets of mtarget, this method returns FALSE with error message. Otherwise, it copies the pointer to the child targets of mtarget as its child targets. If the number of processors is 1, we can use a single processor target as the argument. In this case, the argument target becomes the child target of this target.

void enforceInheritance(); 
int inherited();

The first method sets the initial value of the inheritProcessors state while the second method gets the current value of the state.

void initState();

Is a redefined public method to initialize the state and implements the precedence relation between states.

Other MultiTarget public members

virtual DataFlowStar* createSpread() = 0; 
virtual DataFlowStar* createCollect() = 0; 
virtual DataFlowStar* createReceive(int from, int to, int num) = 0;
virtual DataFlowStar* createSend(int from, int to, int num) = 0;

These methods are pure virtual methods to create Spread, Collect, Receive, and Send stars that are required for sub-universe generation. The last two method need three arguments to tell the source and the destination processors as well as the sample rate.

virtual void pairSendReceive(DataFlowStar* snd, DataFlowStar* rcv);

This method pairs a Send, snd, and a Receive, rcv, stars. In this base class, it does nothing.

virtual IntArray* candidateProcs(ParProcessors* procs, DataFlowStar* s);

This method returns the array of candidate processors which can schedule the star s. The first argument is the current ParProcessors that tries to schedule the star . This class does nothing and returns NULL.

virtual Profile* manualSchedule(int count);

This method is used when this target is inside a wormhole. This method determines the processor assignments of the Profile manually. The argument indicates the number of invocations of the wormhole.

virtual void saveCommPattern(); 
virtual void restoreCommPattern(); 
virtual void clearCommPattern();

These methods are used to manage the communication resources. This base class does nothing. The first method saves the current resource schedule, while the second method restores the saved schedule. The last method clears the resource schedule.

virtual int scheduleComm(ParNode* node, int when, int limit = 0);

This method schedules the argument communication node, node, available at when. If the target can not schedule the node until limit, return -1. If it can, return the schedule time. In this base class, just return the second argument, when, indicating that the node is scheduled immediately after it is available to model a fully-connected interconnection of processors.

virtual ParNode* backComm(ParNode* node);

For a given communication node, find a communication node scheduled just before the argument node on the same communication resource. In this base class, return NULL.

virtual void prepareSchedule(); 
virtual void prepareCodeGen();

These two methods are called just before scheduling starts, and just before code generation starts, to do necessary tasks in the target class. They do nothing in this base class.

17.2.2 Class CGMultiTarget

While class CGMultiTarget is the base multiprocessor target for all code generation domains, either homogeneous or heterogeneous, it models a fully-connected multiprocessor target. In the target list in pigi, "FullyConnected" target refers to this target. It is defined in $PTOLEMY/src/domains/cg/targets directory. It has a constructor with three argument like its base class, MultiTarget.

To specify child targets, this class has the following three states.

StringArrayState childType; 
StringArrayState resources; 
IntArrayState relTimeScales;

The above states are all protected. The first state, childType, specifies the names of the child targets as a list of strings separated by a space. If the number of strings is fewer than the number of processors specified by nproc parameter, the last entry of childType is extended to the remaining processors. For example, if we set nproc equal to 4 and childType to be "default-CG56[2] default-CG96", then the first two child targets become "default-CG56" and the next two child targets become "default-CG96".

The second state, resources, specifies special resources for child targets. If we say "0 XXX ; 3 YYY", the first child target (index 0) has XXX resource and the fourth child (index 3) has YYY resource. Here ';' is a delimeter. If a child target (index 0) has a resources state already, XXX resource is appended to the state at the end. Note that we can not edit the states of child targets in the current pigi. If a star needs a special resource, the star designer should define resources StringArrayState in the definition of the star. For example, a star S is created with resources = YYY. Then, the star will be scheduled to the fourth child. One special resource is the target index. If resources state of a star is set to "2", the star is scheduled to the third target (index 2).

The third state indicates the relative computing speed of the processors. The number of entries in this state should be equal to the number of entries in childType. Since we specify the execution of a star with the number of cycles in the target for which the star is defined, we have to compensate the relative cycle time of processors in case of a heterogeneous target environment.

Once we specify the child targets, we select a scheduler with appropriate options. States inherited from class MultiTarget are used to select the appropriate scheduling options. In the CGMultiTarget class, we have the following three states, all protected, to choose a scheduler unless the manual scheduling option is taken.

IntState ignoreIPC; 
IntState overlapComm; 
IntState useCluster;

The first state indicates whether we want to ignore communication overhead in scheduling or not. If it says YES, we select the Hu's Level Scheduler . If it says NO, we use the next state, overlapComm. If this state says YES, we use the dynamic level scheduler . If it says No, we use the last state, useCluster. If it says YES, we use the declustering algorithm . If it says NO, we again use the dynamic level scheduler. By default, we use the dynamic level scheduler by setting all states NO. Currently, we do not allow communication to be overlapped with computation. If more scheduling algorithms are implemented, we may need to introduce more parameters to choose those algorithms.

There are other states that are also protected.

StringState filePrefix;

Indicates the prefix of the file name generated for each processor. By default, it is set to "code_proc", thus creating code_proc0, code_proc1, etc for code files of child targets.

IntState ganttChart;

If this state says YES (default), we display the Gantt chart of the scheduling result.

StringState logFile;

Specifies the log file.

IntState amortizedComm;

If this state is set to YES, we provide the necessary facilities to packetize samples for communication to reduce the communication overhead. These have not been used nor tested yet.

Now, we discuss the three basic methods: setup, run, wrapup.

void setup();

(1) Based on the states, we create child targets and set them up: prepareChildren.

virtual void prepareChildren();

This method is protected. If the children are inherited, it does nothing. Otherwise, it clears the list of current child targets if they exist. Then, it creates new child targets by createChild method and give them a unique name using filePrefix followed by the target index. This method also adjusts the resources parameter of child targets with the resources specified in this target: resourceInfo. Finally, it initializes all child targets.

virtual Target* createChild(int index);

This protected method creates a child target, determined by childTypes, by index.

virtual void resourceInfo();

This method parses the resources state of this class and adjusts the resources parameter of child targets. If no resources parameter exists in a child target, it creates one.

(2) Choose a scheduler based on the states: chooseScheduler.

virtual void chooseScheduler();

This is a protected method to choose a scheduler based on the states related to scheduling algorithms.

(3) If it is a heterogeneous target, we flatten the wormholes: flattenWorm. To represent a universe for heterogeneous targets, we manually partition the stars using wormholes: which stars are assigned to which target.

void flattenWorm();

This method flattens wormholes recursively if the wormholes have a code generation domain inside.

(4) Set up the scheduler object. Clear myCode stream.

(5) Initialize the flattened galaxy, and perform the parallel scheduling: Target::setup.

(6) If the child targets are not inherited, display the Gantt chart if requested:
writeSchedule.

void writeSchedule();

This public method displays a Gantt chart.

(7) If this target is inside a wormhole, it adjusts the sample rate of the wormhole ports (CGTarget::adjustSampleRates), generates code (generateCode), and downloads and runs code in the target (CGTarget::wormLoadCode).

void generateCode();

This is a redefined public method. If the number or processors is 1, just call generateCode of the child target and return. Otherwise, we first set the stop time, or the number of iteration, for child targets (beginIteration). If the target is inside a wormhole, the stop time becomes -1 indicating it is an infinite loop. The next step is to generate wormhole interface code (wormInputCode, wormOutCode if the target is inside a wormhole. Finally, we generate code for all child targets (ParScheduler::compileRun). Note that we generate wormhole interface code before generating code for child targets since we can not intervene the code generation procedure of each child target once started.

void beginIteration(int repetitions, int depth); 
void endIteration(int repetitions, int depth);

These are redefined protected methods. In the first method, we call setStopTime to set up the stop time of child targets. We do nothing in the second method.

void setStopTime(double val);

This method sets the stop time of the current target. If the child targets are not inherited, it also sets the stop time of the child targets.

void wormInputCode(); 
void wormOutputCode(); 
void wormInputCode(PortHole& p);
void wormOutputCode(PortHole& p);

These are all redefined public methods. The first two methods traverse the portholes of wormholes in the original graph, find out all portholes in sub-universes matched to each wormhole porthole, and generate wormhole interface code for the portholes. The complicated thing is that more than one ParNode is associated with a star and these ParNodes may be assigned to several processors. The last two methods are used when the number of processors is 1 since we then use CGTarget::wormInputCode,wormOutputCode instead of the first two methods.

int run();

If this target does not lie in a wormhole or it has only one processor, we just use CGTarget::run to generate code. Otherwise, we transfer data samples to and from the target: sendWormData and receiveWormData.

int sendWormData(); 
int receiveWormData();

These are redefined protected methods. They send data samples to the current target and receive data samples from the current target. We traverse the wormhole portholes to identify all portholes in the sub-universes corresponding to them, and call sendWormData, receiveWormData for them.

void wrapup();

In this base class, we write code for each processor to a file.

Other CGMultiTarget protected members

ParProcessors* parProcs;

This is a pointer to the actual scheduling object associated with the current parallel scheduler.

IntArray canProcs;

This is an integer array to be used in candidateProcs to contain the list of processor indices.

virtual void resetResources();

This method clears the resources this target maintains such as communication resources.

void updataRM(int from, int to);

This method updates a reachability matrix for communication amortization. A reachability matrix is created if amortizedComm is set to YES. We can packetize communication samples only when packetizing does not introduce deadlock of the graph. To detect the deadlock condition, we conceptually cluster the nodes assigned to the same processors. If the resulting graph is acyclic, we can packetize communication samples. Instead of clustering the graph, we set up the reachability matrix and update it in all send nodes. If there is a cycle of send nodes, we can see the deadlock possibility.

Other CGMultiTarget public members

The destructor deletes the child targets, scheduler, and reachability matrix if they exist. There is an isA method defined for type identification.

Block* makeNew() const;

Creates an object of CGMultiTarget class.

int execTime(DataFlowStar* s, CGTarget* t);

This method returns the execution time of a star s if scheduled on the given target t. If the target does not support the star, a value of -1 is returned. If it is a heterogeneous target, we consider the relative time scale of processors. If the second argument is NULL or it is a homogeneous multiprocessor target, just return the execution time of the star in its definition.

IntArray* candidateProcs(ParProcessors* par, DataFlowStar* s);

This method returns a pointer to an integer array of processor indices. We search the processors that can schedule the argument star s by checking the star type and the resource requirements. We include at most one idle processor.

int commTime(int from, int to, int nSamples, int type);

This method returns the expected communication overhead when transferring nSamples data from from processor to to processor. If type = 2, this method returns the sum of receiving and sending overhead.

int scheduleComm(ParNode* comm, int when, int limit = 0);

Since it models a fully-connected multiprocessor, we can schedule a communication star anytime without resource conflict that returns the second argument when.

ParNode* backComm(ParNode* rcv);

This method returns the corresponding send node paired with the argument receive node, rcv. If the argument node is not a receive node, return NULL.

int amortize(int from, int to);

This method returns TRUE or FALSE, based on whether communication can be amortized between two argument processors.

17.2.3 Class CGSharedBus

Class CGSharedBus, derived from class CGMultiTarget, is a base class for shared bus multiprocessor targets. It has the same kind of constructor as its base class.

This class has an object to model the shared bus.

UniProcessor bus;
UniProcessor bestBus;

These are two protected members to save the current bus schedule and the best bus schedule obtained so far. The bus and bestBus are copied to each other by the following public methods.

void saveCommPattern();
void restoreCommPattern(); 
clearCommPattern();
void resetResources()

The first method is a public method to clear bus schedule, while the second is a protected method to clear both bus and bestBus.

This classes redefines the following two public methods.

int scheduleComm(ParNode* node, int when, int limit = 0);

This method schedules the argument node available at when on bus. If we can schedule the node before limit, we schedule the node and return the schedule time. Otherwise, we return -1. If limit = 0, there is no limit on when to schedule the node.

ParNode* backComm(ParNode* node);

For a given communication node, find another node scheduled just before the argument node on bus.