README for src/contrib/atrade Version: @(#)README 1.4 1/9/97 Architecture Trade Capability Copyright (c) 1995-1997 Sanders, a Lockheed Martin Company Acknowledgement This work was performed by Sanders, a Lockheed Martin Company, as part of the Sanders RASSP program under contract N00014-93-C-2172 to the Naval Research Laboratory, 4555 Overlook Avenue, SW, Washington, DC 20375-5326. The Sponsoring Agency is: Advanced Research Projects Agency, Electronic System Technology Office, 3701 North Fairfax Drive, Arlington, VA 22203-1714. The Sanders RASSP team consists of Sanders, Motorola, Hughes, and ISX. Permission is hereby granted, without written agreement and without license or royalty fees, to use, copy, modify, and distribute this software and its documentation for any purpose, provided that the above copyright notice and the above acknowledgement and following two paragraphs appear in all copies of this software. IN NO EVENT SHALL SANDERS OR THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF SANDERS OR THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. SANDERS AND THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIM ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND SANDERS AND THE UNIVERSITY OF CALIFORNIA HAVE NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. 1.0 Introduction At Sanders, we have developed a proof of concept architectural trade capability using Ptolemy's Discrete Event domain. It is meant as a first cut at a usable capability, and is provided for demonstration purposes only. There are many simplifications made in order to get the tool working for a few example cases. It being provided to members of the Ptolemy community interested in this type of architectural trade capability work. A custom graphical front-end to Ptolemy has been developed that allows a user to sketch a target architecture in one window and quickly map the stars in a SDF graph in another window to the processors in the architecture. Extensions to the DE domain have been implemented to allow a performance-level model of the architecture to be simulated. These extensions create a DE domain model representing the mapping of the algorithm to the architecture and use the Ptolemy kernel to simulate the performance. The product of the simulation is a Gantt chart showing the execution of stars over time as well as a thermometer display of estimates of certain other system level metrics (weight, size, power, reliability, etc.). This capability has been developed as a front-end architectural trade tool for the Sanders RASSP Program (see above acknowledgement). 2.0 Composition of Architecture Trade Capability This capability (denoted atrade) consists of extensions to the DE and SDF domains in the form of new particles and stars. A custom GUI, called pigi+, provides the user interface to the capability and works with the Ptolemy kernel via the ptcl interface. Since we are using only the SDF and DE domains and we are using pipes to interface with Ptolemy, we have chosen to build a custom executable called pipeptcl.ptiny, based on ptcl.ptiny. Some minor modifications were made to the ptcl interface in order to accommodate communication between ptcl and pigi+ via standard pipe mechanisms, and these are used in building pipeptcl.ptiny. Because pigi+ has been built using commercial libraries (RogueWave and Motif), only a binary which runs under SunOS and Solaris is included in the distribution. However, source code for the interface to pipeptcl.ptiny has been provided. 3.0 Installing atrade -- The atrade tar file should be installed so that this README file is at $PTOLEMY/src/contrib/atrade/README -- Update $PTOLEMY/src/kernel/PortHole.cc with the atrade/kernel/PortHole.cc file and rebuild the kernel. The makefile rule 'updatePtolemy' will do this for you, run: cd $PTOLEMY/src/contrib/atrade make updatePtolemy -- Create a obj.$PTARCH/contrib/atrade tree for your binaries by running MAKEARCH: $PTOLEMY/src/contrib/atrade/MAKEARCH -- Build and install the atrade de libraries and pipeptcl: cd $PTOLEMY/obj.$PTARCH/contrib/atrade make install -- cd back to the atrade/bin directory, set an environment variable, then start up the pigi+ program cd $PTOLEMY/src/contrib/atrade/gui/bin setenv PIGI+_HOME `pwd` pigi+ searches the path for the pipeptcl.ptiny binary to run. 4.0 Getting started You then invoke pigi+ from the command line pigi+ and you will get a small "Architecture Trade" GUI with three buttons. If you focus your mouse over each button, after a second or so a banner identifying the button is shown. From left to right, the buttons do the following: Algorithm Schematic: contains the SDF graph that is to be mapped; used to draw, save, or load SDF graphs Architecture Schematic: contains the architecture that is to be used for the mapping; used to draw, save, or load architectures Map Algorithm to Architecture: brings up dialog box which is used to specify the system specification and parts data files, and to simulate the specified mapping Typically, you first create the algorithm. This may be best done first by using pigi and then redrawn within the "Algorithm Schematic" window of pigi+. Unfortunately, there is not a way of importing/exporting graphs between pigi and pigi+. Next, an architecture is created using the "Architecture Schematic". Finally, algorithmic blocks are grouped and mapped onto the architecture, and the performance simulation is created and executed, and the results are displayed via a Gantt chart and thermometer display. The next sections describe these steps in more detail. 4.1 Algorithm Schematic This window is used to draw the SDF graphs which will later be mapped onto the target architectures. Algorithms are drawn using a menu and a set of seven buttons at the top of the Algorithm Schematic window. Hierarchial graphs are not supported by pigi+. 4.1.1 Menu Under "File", the user may open a previously saved graph, save the current graph, or close the schematic window. Under "Execution", the user may set the run length of the SDF graph by selecting "Runtime Parameters" and entering a value in the dialog box. 4.1.2 Buttons There are seven buttons used in creating SDF graphs. A banner identifying each button appears after the mouse focuses on the button for a second or so. From left to right, the buttons do the following: Run Graph: executes the SDF graph by creating and simulating the universe via the ptcl interface; the run length is set via the dialog box reached by selecting "Runtime Parameters" under "Execution" on the menu bar Star Palette: provides the palette of stars which are used to create the graphs; you must reselect the SDF domain the first time the palette is brought up for it to display the available stars; new stars will appear in the upper left portion of the drawing area and should be moved before the next star is created; the three buttons (galaxy, galaxy input, galaxy output) are not currently implemented and should not be used Toggle Grid Display: allows grid display to be toggled on and off Toggle Connection Display: allows connections on ports to be toggled between being displayed or hidden Pointer: changes the mode of the mouse to pointer so that the state parameters of the stars may be displayed and edited (left button), that the stars be selected and dragged around (middle button) or deleted (right button); once stars have been mapped, the first right button click deletes the selection box and the second right button click will delete the star Wire Tool: changes the mode of the mouse to draw connections (wires) between the stars Selection Tool: changes the mode of the mouse to group one or more stars for mapping purposes; the resultant selection box is always rectangular and is drawn starting at the upper left to the lower right 4.2 Architecture Schematic This window has the same menu and buttons as for the "Algorithm Schematic" window. The main difference here is that the DE domain stars are used in creating architectures. Thus, upon selection of the Star Palette, the DE domain should be chosen. The main stars of interest include: Processor, I860, SHARC, Raceway, VMEBus. Another difference is that the "Runtime Parameters" here defines the end time of the DE simulation. 4.3 Algorithm to Architecture Mapping In the "Algorithm Schematic" window, one or more functional blocks must be grouped using the Selection mode. The groups are numbered (starting with zero) and increment as new groups are defined. All functional blocks should be a part of a group before the performance model is run. In the "Architecture Schematic" window, each processor star (Processor, I860, or SHARC) must be selected individually using the Selection mode. Another set of numbers are assigned to these selections, also starting at zero. The functional blocks in group X in the "Algorithm Schematic" are in effect mapped to the processor in group X in the "Architecture Schematic". Thus, the performance model will simulate the execution of these functional blocks on that processor using the defined cost functions (see section 4.4.3). Once the mapping is complete, the performance simulation is ready to run. 4.4 Performance Simulation (Map Algorithm to Architecture) The corresponding dialog box invoked by selecting this window from the GUI brings up a dialog box which is used to specify the system specification and parts data files, and to simulate the specified mapping. Default files have been provided. Details on the format of these files are provided in section 4.4.4. By selecting the "Map" button, pigi+ invokes Ptolemy via pipeptcl.ptiny (using the ptcl interface), creates the DE domain performance model of the architecture, algorithm, and mapping, and then executes the model. Results are stored to a temporary file in /usr/tmp/gantt.log as the model executes. You may need to insure that write permissions allow you to write to this area. At the end of the simulation, pigi+ provides two windows: a gantt chart and a system thermometer display. 4.4.1 Gantt chart The Gantt chart displays the activity on each bus and processor as a function of time. The gantt chart window can be resized using the standard window resizing, and the four direction buttons can be used to expand or contract the gantt display within the window. The left and middle buttons can each be used to measure time, and "snap" to the nearest event on the row where the mouse is focused. The times t1 and t2 are displayed in red at the bottom of the window for the markers corresponding to the left and middle buttons, and the difference between the two markers is also displayed. The right button is used to toggle a 100 us grid on and off. For the busses, blue denotes the bus is busy, and the number indicates the destination of the current bus traffic (processor number). For the processors, yellow is used to denote the reception or transmission of data and a light green-blue is used to indicate that processing is taking place. The name of the function being processed (suffixed by a unique numeric identifier) is displayed on each block. 4.4.2 Thermometer Display In selecting an architecture and a mapping, performance is usually very important. However, certain other system metrics must often be considered as well. As a result, in addition to the Gantt chart provided after each simulation run, the architectural trade tool also gives some feedback on the following system metrics via a thermometer display: function, environment, interfaces, schedule, cost, processor, interconnect, software, size, weight, power, reliability, testability, maintainability, fault tolerance, scalability, and standards. These system metrics are estimated using simple models with stored manufacturer specifications, historical data, and certain information from the mapping and performance simulation. The user provides system specifications for each of these metrics, in terms of minimum, nominal, and maximum values. In addition, the user also specifies the relative importance of the metrics to each other using a numeric weighting. The estimated system metrics are graphed against the given system specifications using a thermometer bar graph display. The height of the individual thermometer bar denotes the relative importance assigned to each metric. All thermometers are normalized so that the center matches the nominal specified value for each metric. A thermometer bar filled in to the left of center indicates that the system metric does not meet the nominal value while a bar reaching to the right of center shows that the nominal value has been satisfied. Some of the metrics are displayed using a reverse scale so that the thermometers are consistent in showing shortfalls--a value to the left of center always indicates a shortfall, regardless of whether the actual numeric value is lower or higher than the nominal value (e.g. power versus reliability). The display has the option of showing the minimum and maximum specification values for the metrics as a range around the nominal value. Because the reported metric values are estimates, a measure of the relative accuracy of the calculations can also be shown as a range around the reported metric. The purpose of these estimates is to provide a first level measure of system metrics to aid in selecting an architecture instead on concentrating entirely upon the performance results. 4.4.3 Cost Functions Cost functions are used to model the computational and memory costs of executing a functional block. These functions include a constant overhead term and a variable term. The variable term in the cost function allows the computational and memory costs to be expressed in terms of the state variables of the functional star they represent. Typically, the variable term is used to scale the costs as a function of the amount of data being processed. In the cost files, the letters w, x, y, z, are used as follows: w : constant overhead computational cost in processor cycles x : variable computational cost in processor cycles as a function of the state variables y : constant overhead memory cost in bytes z : variable computational cost in bytes as a function of the state variables The cost file has one line defining each of these, followed by four lines, each with one of the letters. pigi+ calls the Unix utility bc in order to do these computations. For the SDF star SDFOrthogonalize, here is the corresponding Orthogonalize.cost file: w = 30 x = 4 + (28*blockSize) y = 20 z = 20 w x y z Currently, the memory cost functions are not used or implemented. Only the cost functions for computations are implemented. 4.4.4 Specification and Parts Files The specification file is used to specify the system specifications in 18 areas. An example file is given at $PTOLEMY/src/contrib/atrade/gui/bin/std.spec. For each entry in this file, there is the name of the specification, the type of specification (intended use, performance, or supportability), the units of the specification values, the relative system weighting, a numeric specification value, a minimum specification value, and a maximum specification value. The parts file is used to specify the system specifications of the parts used in the system. An example file is given at $PTOLEMY/src/contrib/atrade/gui/bin/std.parts. A separate line is used for each part. The line starts with part name (terminated by a semicolon), followed by one or more system specifications separated by colons. For each system specification, a minimum, nominal, and maximum value is specified. 5.0 Directory Description $PTOLEMY/src/contrib/atrade contains this README file as well as a postscript version of the 1996 International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 4-page paper describing this capability (arch_trade.ps) and a GIF file showing an example mapping (example.gif) $PTOLEMY/src/contrib/atrade/gui/kernel contains a new version of $PTOLEMY/src/kernel/PortHole.ccn This version merely modifies the PortHole::print method by adding two lines: // added to visibility into PortHole objects via ptcl interface out << "(numTokens = " << numberTokens << ")"; $PTOLEMY/src/contrib/atrade/gui/src contains some key sources used to build pigi+ (our GUI), namely those that interface with ptcl $PTOLEMY/src/contrib/atrade/gui/bin contains pigi+ executable, and some configuration files for pigi+; to run it, you must set the following environment variable to where its installed, e.g. setenv PIGI+_HOME $PTOLEMY/src/contrib/atrade/gui/bin $PTOLEMY/src/contrib/atrade/gui/bin/cost contains cost functions used by pigi+ $PTOLEMY/src/contrib/atrade/gui/bin/icons contains icons used by pigi+ $PTOLEMY/src/contrib/atrade/gui/bin/schematics/* contains architecture and SDF graphs used by pigi+ $PTOLEMY/src/contrib/atrade/de/kernel/* contains new particles (*.h and *.cc) for atrade (these changes can co-exist with the current DE domain kernel) $PTOLEMY/src/contrib/atrade/de/stars contains new DE stars (*.pl) for atrade $PTOLEMY/src/contrib/atrade/sdf/stars contains new SDF stars (*.pl) used in atrade examples $PTOLEMY/src/contrib/atrade/pipeptcl contains files needed to build pipeptcl.ptiny, which is a version of ptcl.ptiny that can communicate over a pipe. This directory uses libptcl, and differs from the vanilla ptcl in that pipeptclAppInit.cc is used instead of ptclAppInit.cc. 6.0 Troubleshooting pigi+ is a prebuilt SunOS binary with certain paths hardcoded in. It should also run under Solaris. You may need to set a few environment variables. 1) If you get a segmentation error upon startup: ptuser@watson 149% ./pigi+ Warning: locale not supported by Xlib, locale set to C Warning: X locale modifiers not supported, using default Segmentation Fault pigi+ looks for /usr/lib/X11/nls At UC Berkeley, we had to do: setenv XNLSPATH /usr/sww/sunos-X11R5/lib/X11/nls 2) After starting pigi+, nothing happens: ptuser@watson 161% ./pigi+ Make sure that the pipeptcl.ptiny that is in your path is a link to pipeptcl.ptiny. See the installation instructions. 7.0 Example Run Startup pigi+ and press left button on the main dialog to bring up the algorithm schematic window. Under "File", chose "Open" and select gramschmidt.drw. Now press the middle button on the main dialog to bring up the architecture schematic window. Under "File", select "Open" and select sharc_vme.drw. Now chose "Execution" then "Run Length" and set this parameter to 0.002 (this prevents the simulation from running for too many iterations). Back to the algorithm window, select the rightmost button to activate the Selection Tool. Now draw a separate box around each Normalization block (blue box) and Orthogonalization block (green circle), for a total of five red dashed boxes numbered 0 to 4. In the architecture schematic window, activate the Selection Tool and do the same thing for each of the Processors. Now each functional block (Normalization or Orthogonalization) are mapped to a processor. Now press the right button on the main dialog to bring up the Mapping Parameters dialog box. The default files will work fine. The algorithm mapping onto the architecture can now be simulated by pressing the Map button. Within a few seconds, a thermometer display and a gantt chart will appear. You may try reducing the number of processors in the architecture and mapping more than one functional block to the processors. Different mappings will yield different results.