Neva is a MEDEA+ collaboration of industrial and academic partners: ACE, Bull Certess, LETI/CEA, NXP Semiconductors, Silicomp, STMicroelectronics, TIMA, and VERIMAG.
At DATE'07, at NICE from April 16-20, we will be showing our work done in CoSy as part of the NEVA research project. We do this together with other researchers from RWTH Aachen, TU Delft, University of Edinburgh, and the University of Amsterdam. Title of our presentation is "Enabling technology for heterogeneous multi-core compilation".
A demonstration will be shown. Come and meet us at stand R7.
If you are interested in doing a Bachelor Project (11 ECTS), Software Project (20 ECTS), Research Project (17 ECTS) or Master Project (42 ECTS) within the Computer Systems group, have a look at our student projects page where a list of projects is given.
``System
design using Kahn Process Networks - the Compaan/Laura
Approach''
``System
Level Design with Embedded Platforms".
This link contains
all the presentation slides in
PDF format.
``Compaan:
Deriving Process Networks from Matlab for Embedded Signal Processing
Architectures.''
Finished Master Project Publications: Publications.
Sjoerd Meijer, Bart Kienhuis, Johan Walters, and David Snuijf
``Automatic partitioning and mapping of stream-based applications onto the Intel IXP Network Processor'', Workshop on Software & Compilers for Embedded Systems (SCOPES'07), Nice, April 20 2007
When studying the IXP Network processor architecture from Intel, we
found quite some interesting aspects that make the IXP attractive for
stream-based applications. The architecture is highly optimized for
streaming data, albeit in the form of internet packets. Furthermore,
the architecture has Gigabit Ethernet connectors for handling incoming
and outgoing traffic and can process this data at real-time using
dedicated microengines. In this paper, we try to answer three
questions; 1) Can we use the IXP architecture for stream-based
applications? 2) can we map applications written as a KPN onto an IXP?
3) can we integrate the generation of KPNs using the Compaan compiler
with a tool flow that maps the KPN to an IXP, thereby make the
programming of IXP much simpler? As will be shown, all three steps can
be performed and we show that we can map automatically two
non-internet stream-based applications (QR and DWT) onto the IXP.
Sjoerd Meijer, Bart Kienhuis, Alex Turjan and Erwin de Kock
Ming-Yung Ko, Claudiu Zissulescu,
Sebastian Puthenpurayil, Shuvra Bhattacharyya, Bart Kienhuis, and
Ed Deprettere,
Alexandru Turjan, Bart
Kienhuis and Ed Deprettere,
Claudiu Zissulescu, Bart
Kienhuis and Ed Deprettere,
Claudiu Zissulescu, Bart
Kienhuis and Ed Deprettere,
Alexandru Turjan, Bart
Kienhuis and Ed Deprettere,
Andre van der Plas, Bart
Kienhuis, ``A Methodology for Efficient Reuse of Open Source
Code'',
Steven Derrien, Alexandru
Turjan, Claudiu Zissulescu, Bart Kienhuis and Ed
Deprettere, ``Deriving Efficient Control in Process Networks with
Compaan/laura'', Published in the International
Journal of Embedded Systems, volume 1, Issue 7, 2005 Embedded-Journal Inderscience
(IJES)
Alexandru Turjan, Bart
Kienhuis and Ed Deprettere,
Alexandru Turjan, Bart
Kienhuis and Ed Deprettere,
Claudiu Zissulescu, Bart
Kienhuis and Ed Deprettere,
``A Generic Process Splitting Transformation For Kahn Process Networks'', Proceedings of the DATE conference (DATE'07), Nice, April 17 - 19 2007
Kahn Process Networks (KPN) are very suitable for systematic mapping
onto multiprocessor architectures. Running applications written in this
parallel program specification on a multiprocessor architecture, however,
does not guarantee that the runtime requirements are met. Therefore, it
may be necessary to further analyze and optimize Kahn process networks.
In this paper, we will present a process splitting technique that results
in a functionally equivalent process network, but with a changed and optimized
network structure. The class of networks that can be handled is generic as
it is not restricted to static networks. The presented approach can also
handle processes with dynamic program statements. We prototyped the splitting
transformation in the GCC compiler for a JPEG decoder application. This
resulted in a 21% performance improvement.
``Parameterized Looped Schedules for
Compact Representation of Execution Sequences in DSP Hardware
and Software Implementation'', Accepted for publication (2006) in the IEEE
Transactions on Signal Processing.
Describes joined work between LIACS and Dept of Electrical Engineering, Univ. of Maryland
In this paper, we present a technique for compact
representation of execution sequences in terms of efficient
looping constructs. Here, by a looping construct, we mean a compact way of specifying a finite repetition
of a set of execution primitives. Such compaction, which can be viewed as a form of hierarchical runlength
encoding (RLE), has application in many VLSI signal processing contexts, including efficient control
generation for Kahn processes on FPGAs, and software synthesis for static dataflow models of computation.
In this paper, we significantly generalize previous models for loop-based code compaction of DSP
programs to yield a configurable code compression methodology that exhibits a broad range of achievable
trade-offs. Specifically, we formally develop and apply to DSP hardware and software synthesis a parameterizable
loop scheduling approach with compact format, dynamic reconfigurability, and low-overhead
decompression.
``Classifying Interprocess
Communication in Process Network Representation of Nested Loop
Programs'', Accepted for publication (2006) in the ACM Transactions
on Embedded Computing Systems.
``Communication
Synthesis in a multiprocessor environment'', Presented at
Field-Programmable Logic and Applications conference (FPL'05) Tampere, Finland August 24 - 26,
2005
At Leiden
University, we are developing a design methodology that allows for
fast mapping of nested-loop applications (e.g. DSP, Imaging, or
Multi-Media) written in a subset of Matlab onto reconfigurable
devices. This design methodology is implemented into a tool chain that
we call Compaan/Laura. This methodology generates a process
network in which the inter-process communication takes place in a
point-to-point fashion. Four types of point-to-point inter-processor
communication exist in the PN. Two of them use a FIFO like
communication and the other two use a cache like memory to exchange
data. In this paper, we investigate the realizations for the four
communication types and show that point-to-point communication at the
level of scalars can be realized automatically and very efficiently in
today's FPGAs.
``Expression
Synthesis in Process Networks generated by Laura'', Presented at the 16th IEEE International Conference on
Application-specific Systems, Architectures and Processors ASAP2005, July 23 -- 25,
2005, Samos, Greece The
Compaan/Laura tool chain maps nested loop applications written
in Matlab onto reconfigurable platforms, such as FPGAs. \Compaan
rewrites the original Matlab application as a Process Network in which
the control is parameterized and distributed. This control is given as
parameterized polytopes that are expressed in terms of pseudo-linear
expressions. These expressions cannot always be mapped efficiently
onto hardware as they contain multiplication and integer division
operations. This obstructs the data flow through the
processes. Therefore, we present in this paper the Expression Compiler
that efficiently maps pseudo-linear expressions onto reconfigurable
hardware in such a way that the distributed and parameterized control
never obstructs the data flow through processors. This compiler
employs techniques like number theory axioms, method of difference,
and predicated static single assignment code.
``Solving Out-of-Order Communication in Kahn Process
Networks'',
in the Journal of VLSI Signal Processing, Issue:
Volume 40, Number 1 Date: May 2005 Pages: 7 - 18
The Compaan compiler framework automates the transformation
of
DSP applications written in Matlab into Kahn Process Networks
(KPNs). These KPNs express the signal processing applications in a
parallel distributed way making them more suitable for mapping onto
parallel architectures. A simple instance of a generated KPN by
Compaan is a Producer process that communicates with a Consumer
process via a FIFO buffer, with the Consumer reading data from the
FIFO using a blocking read. When the sequence of producing data is
different from the sequence of consuming data, a simple FIFO is not
sufficient to implement the communication. For such case, extra
storage and control are needed at the consumer side. This paper
presents a compile approach that determines whether a FIFO buffer is
sufficient for every Producer/Consumer pair of a Compaan-generated
KPN. When additional memory is required, we provide an address
generation mechanism to perform the reordering and furthermore give a
lower bound on the amount of memory needed to perform the
reordering. The presented approach is based on the Ehrhart theory.
presented the First Conference for the dutch Software
Engineering Community (JACQUARD2005)
Research has been carried out to explore if, and how, it is possible to
increase the efficiency in which open source code can be used. The
research has shown that structuring such code can improve this
efficiency. It has lead to a methodology and a set of techniques, which
can be used to (re) structure the code.
At Leiden Embedded Research Center (LERC), we are building a tool
chain called Compaan/Laura that allows us to map rapidly and
efficiently signal processing applications written in Matlab onto
reconfigurable platforms. In this chain, first the Matlab code is
converted automatically to executable Kahn Process Network (KPN)
specification, then a tool called Laura transforms the PN
specification into design implementations described as synthesizable
VHDL.
The applications targeted by Compaan are usually data-flow intensive,
requiring large computational power. Therefore, an important issue in
Laura is the derivation of efficient and scalable hardware control
structures. This control is based on an abstract representation given
as parametrized polytopes. Although this representation can be
directly translated into nested guarded for-loops (very suitable for
software implementation) its mapping to hardware is much more
difficult. In this paper we investigate the opportunities of deriving
different hardware realizations for this control, and explore the
trade off between speed and resources usage.
``A
Hierarchical Classification Scheme to Derive Interprocess
Communication in Process Networks'', in the proceedings of the
15th IEEE International Conference on Application-specific Systems,
Architectures and Processors ASAP2004, Sept 27 -- 29, 2004,
Galveston, Texas The
Compaan compiler automatically derives a Process Network (PN)
description from an application written in Matlab. The basic element
of a PN is a Producer/Consumer (P/C) pair. Four different
communication patterns for a P/C pair have been identified and the
complexity of communication structure differs depending on the
communication pattern involved. Therefore, in order to obtain
cost-efficient process networks our compiler automatically identifies
the communication pattern of each P/C pair. This problem is
equivalent to integer linear programming and thus in general can not
be solved efficiently. In this paper we present simpler techniques
that allow to classify the interprocess communication in a
PN. However, in some cases those techniques do not allow to find an
answer and therefore, an ILP test has still to be applied. Thus, we
introduce a hierarchical classification scheme that correctly
classifies the interprocess communication, but uses dramatically less
integer linear programming. In only 5% of the cases to classify, we
still relay on integer linear programming while in the remaining 95%,
the techniques presented in this paper are able to classify a case
correctly.
``Translating affine nested-loop programs to Process Networks'',
in proceedings of at the international conference on compilers,
architecture, and synthesis for Embedded Systems (CASES'04), Sept
23 -- 25, 2004, Washington D.C. New
heterogeneous multiprocessor platforms are emerging that are typically
composed of loosely coupled components that exchange data using
programmable interconnections. The components can be CPUs or DSPs,
specialized IP cores, reconfigurable units, or memories. To program
such platform, we use the Process Network (PN) model of
computation. The localized control and distributed memory are the two
key ingredients of a PN allowing us to program the platforms. The
localized control matches the loosely coupled components and the
distributed memory matches the style of interaction between the
components. To obtain applications in a PN format, we have built the
Compaan compiler that translates affine nested-loop programs into
functionally equivalent PNs. In this paper, we describe a novel
analytical translation procedure we use in our compiler that is based
on integer linear programming. The translation procedure consists of
four main steps and we will present each step by describing the main
idea involved, followed by a representative example.
``Increasing pipelined IP core utilization in Process Networks using
Exploration'', In proceedings of Field-Programmable Logic and
Applications (FPL'04), Springer LNCS
3203, pages 690 -- 699
At Leiden Embedded Research Center, we are building a tool chain
called Compaan/Laura that allows us to do fast mapping of
applications written in Matlab onto reconfigurable platforms, such as
FPGAs, using IP cores to implement the data-path of the
applications. A particular characteristic of the derived networks is
the existence of selfloops. These selfloops have a large impact on the
utilization of IP cores in the final hardware implementation of a PN,
especially if the IP cores are deeply pipelined. In this paper, we
present an exploration methodology that uses feedback provided by the
Laura tool to increase the utilization of IP cores embedded in our PN
network. Using this exploration, we go from 60MFlops to 1,7GFlops for
the QR algorithm using the same number of resources except for memory.
Alexandru Turjan, Bart
Kienhuis and Ed Deprettere,
``An Approach to Classify Inter-Process Communication in Process
Networks at Compile Time'', In proceedings of 8th International
Workshop on Software and Compilers for Embedded Systems (SCOPES2004), Springer LNCS 3199,
pages 62 -- 76, Sept 2 -- 3, 2004, Amsterdam
New embedded signal processing architectures are emerging that are
composed of loosely coupled heterogeneous components like CPUs or
DSPs, specialized IP cores, reconfigurable units, or memories. We
believe that these architectures should be programmed using the
Process Network model of computation. To ease the mapping of
applications, we are developing the Compaan compiler that
automatically derives a Process Network (PN) description from an
application written in Matlab. In this paper, we investigate a
particular problem in nested loop programs, which is about classifying
the interprocess communication in the PN representation of the nested
loop program. The global memory arrays present in the Matlab code have
to be replaced by a distributed communication structure used for
sending data to the network processes. We will show that four types of
communication exists, each exhibiting different requirements when
realizing them in hardware of software. We present two compile time
tests that decide the type of a particular static array. These tests
have become an important part of our Compaan compiler.
Ingrid Verbauwhede (UCLA/KU
Leuven), Patrick Schaumont (UCLA), Christian Piquet (CSEM), and Bart
Kienhuis,
``Architectures and Design techniques for energy efficient embedded
DSp and multimedia processing'' presented at the embedded
Low-power Tutorial at the Design, Automation and Test in Europe
conference DATE2004, Feb
16-20 2004, Paris, France
Energy
efficient embedded systems consist of a heterogeneous collection of
very specific building blocks, connected together by a complex network
of many dedicated busses and interconnect options. The trend to merge
multiple functions into one device makes the design and integration of
these "systems-on-chip" (SOC's) even more problematic. Yet,
specifications and applications are never fixed and require the
embedded units to be programmable. The topic of this paper is to give
the designer architectures and design techniques to find the right
balance between energy efficiency and flexibility. The key is to
include programmability (or reconfiguration) at the right level of
abstraction and tuned to the application domain. The challenge is to
provide an exploration and programming environment for this
heterogeneous architecture platform.
Todor Stefanov, Claudiu
Zissulescu, Alexandru Turjan, Bart Kienhuis, Ed Deprettere,
``System Design using Kahn Process Networks: The Compaan/Laura
Approach'', in proceedings of the Design, Automation and Test in
Europe conference DATE2004, Feb 16-20 2004,
Paris, France
New emerging embedded system platforms in the realm of
high-throughput multimedia, imaging, and signal processing will
consist of multiple microprocessors and reconfigurable components.
One of the major problems is how to program these platforms in a
systematic and automated way so as to satisfy the performance need
of applications executed on these platforms.In this paper, we
present our system design approach as an efficient solution to
this programming problem. We show how for an application written
in Matlab, a Kahn Process Network specification can automatically
be derived and systematically mapped onto a target platform
composed of a microprocessor and an FPGA. Furthermore, we
illustrate how the mapping approach is applied on a real-life
example, namely an M-JPEG encoder.
Claudiu Zissulescu, Todor
Stefanov, Bart Kienhuis, Ed Deprettere,
,``Laura:
Leiden Architecture Research and Exploration Tool'',
Presented
at the International Conference on Field Programmable Logic and
Applications (FPL), Sept 1-3 2003,
Lisbon, Portugal
At Leiden
Embedded Research Center (LERC), we are building a tool chain called
Compaan/Laura that allows us to map fast and efficiently
application written in Matlab onto reconfigurable platforms. In this
chain, first the Matlab code is converted automatically to executable
Kahn Process Network (KPN) specification. Then a tool called
Laura accepts this specification and transforms the
specification into design implementations described as synthesizable
VHDL. In this paper, we present our methodology implemented in
Laura tool to automatically convert KPNs to synthesizable VHDL
code targeted form mapping onto FPGA-based platforms. With the help
of Laura, a designer is able to either fast prototype signal
processing and multimedia applications directly in hardware or to
extract very fast valuable low-level quantitative implementation data
such as performance in terms of clock cycles or silicon
area.
Alexandru Turjan, Bart
Kienhuis
,``Storage Management in Process Networks using the
Lexicographically Maximal Preimage'',
Presented at the
IEEE 14th International Conference on Application-specific Systems,
Architectures and Processors (ASAP'03) June 24-26,
2003, Den Hague, The Netherlands
At the
Leiden Embedded Research Center, we are developing a compiler called
Compaan that automatically translates signal processing applications
written in Matlab into Kahn Process Networks (KPNs). In general, these
signal processing applications are data-flow intensive, requiring
large storage capacities, usually represented by matrices. An
important issue in Compaan is the derivation of a memory management
mechanism that allows for efficient inter-process communication. This
mechanism has previously been published and is called the Extended
Linearization Model (ELM). The controller needed in the ELM is derived
using the Ehrhart theory, leading to a computational intensive
procedure. In this paper, we present a new approach to derive the ELM
controller, based on the notion of Lexicographically Maximal
Preimage. Using polytope manipulations and parametric integer linear
programming techniques, we get less computational intensive and easier
to be derived controller implementation for the ELM.
Alexandru Turjan, Bart
Kienhuis, Ed Deprettere,
``Realizations of the Extended
Linearization Model'',
In chapter 9 of book "Domain-Specific
Embedded Multiprocessors", page 171 -- 191, 2003, by Marcel Dekker, Inc
At the Leiden Embedded Research Center, we are working towards a
framework called Compaan that automates the transformation of digital
signal processing (DSP) applications to Kahn Process Networks
(KPNs). These applications are written in Matlab as parameterized
nested loop programs. This transformation is interesting as KPNs are
well suited for mapping onto parallel architectures. Although the KPN
semantic always assumes that FIFO buffers can be used between
processes, we have found cases in which the FIFO is not enough as data
may arrive in the wrong order. To solve this order problem, we
previously presented the Extended Linearization Model (ELM) that
describes a mechanism to reorder tokens. The introduction of the ELM
does not violate the Kahn Process Network semantics; we still use a
FIFO between a Producer and Consumer. The ELM relays on some
additional memory and a controller to perform the reordering. The ELM
model can be implemented in different ways. In this chapter, we
investigate four different realizations of the ELM. The realizations
differ in the computational complexity of performing the reordering,
the kind of reordering memory used, and the size of the reordering
memory.
Bart Kienhuis and Ed
F. Deprettere,
``Modeling
Stream-Based Applications Using the SBF Model of
Computation'',
The Journal of VLSI Signal Processing-Systems
for Signal, Image, and Video Technology (Kluwer), pag 291
-- 300, Vol. 34, Issue 3, 2003.
Edwin Rijpkema (Promotor Ed
Deprettere and Co-promotor Bart Kienhuis),
``Modeling
Task Level Parallelism in Piece-wise Regular Programs'',
PhD
thesis, Leiden University, Leiden Institute of Advanced Computer
Science (LIACS), The Netherlands,
Sept 2002.
Tim Harriss, Richard Walke, Bart
Kienhuis, and Ed Deprettere Keywords:
Process
Networks, Stream-based Model of computation, Nested Loop Programs,
Matlab, FPGA, Skewing, Unrolling.
Alexandru Turjan, Bart Kienhuis, and Ed Deprettere
Todor Stefanov, Bart Kienhuis, and Ed Deprettere
Bart Kienhuis, Ed Deprettere, Pieter van der Wolf and Kees Vissers
keywords: Y-chart approach, Architecture Template, Stack of
Y-charts, Design Space Exploration, Abstraction Pyramid, Embedded
Systems
Bart Kienhuis, Edwin Rijpkema, and
Ed F. Deprettere
Bart Kienhuis
Ed F. Deprettere, Edwin Rijpkema,
and Bart Kienhuis
Edwin Rijpkema, Bart Kienhuis and
Ed F. Deprettere,
keywords: Process Networks, Matlab, Compilation, Embedded Architectures
B. Kienhuis, E. Deprettere, K. Vissers
and P. van der Wolf,
Peter Held, Bart Kienhuis,
``Compilation
from Matlab to Process Networks Realized in FPGA'', In journal
on Design Automation of Embedded Systems, Kluwer, Vol 7, Issue 4,
2002
Compaan is a software tool that is capable of automatically
translating nested loop programs, written in Matlab, into parallel
process network descriptions suitable for implementation in
hardware. In this article, we show a methodology and tool to convert
these process networks into FPGA implementations. We will show that we
can in principle obtain high performing realizations in a fraction of
the design time currently employed to realize a parameterized
implementation. This allows us to rapidly explore a range of
transformations, such as loop unrolling and skewing, to generate a
circuit that meets the requirements of a particular application. The
QR decomposition algorithm is used to demonstrate the capability of
the tool. We present results showing how the number of clock cycles
and calculations-per-second vary with these transformations using a
simple implementation of the function units. We also provide an
indication of what we expect to achieve in the near future once the
tools are completed and applied the transformations to parallel,
highly pipelined implementations of the function units.
``A compile time based approach for solving out-of-order communication in Kahn Process Networks'', in proceeding of
IEEE 13th International Conference on Aplication-specific Systems, Architectures and Processors (ASAP'2002), San Jose, CA, USA, July 17-19, 2002
The Compaan compiler framework automates the transformation of
DSP applications written in Matlab into Kahn Process Networks
(KPNs). These KPNs express the signal processing applications in a
parallel distributed way making them more suitable for mapping onto
parallel architectures. A simple instance of a generated KPN by
Compaan is a Producer process that communicates with a Consumer
process via a FIFO buffer, with the Consumer reading data from the
FIFO using a blocking read. When the sequence of producing data is
different from the sequence of consuming data, a simple FIFO is not
sufficient to implement the communication. For such case, extra
storage and control are needed at the consumer side. This paper
presents a novel approach that determines at compile time whether a
FIFO buffer is sufficient for every Producer/Consumer pair of a
Compaan-generated KPN. For the case when the additional memory is
required, we also provide an address generation mechanism at
compile time. The presented approach is based on the Ehrhart
theory.
``Algorithmic Transformation Techniques for Efficient Exploration
of Alternative Application Instances'', in proceeding of
Tenth International Symposium on Hardware/Software Codesign
CODES'2002, Stanley Hotel, Estes Park, Colorado, USA, May 6 -- 8, 2002
Following the Y-chart paradigm for designing a system, an application
and an architecture are modeled separately and mapped onto each other
in an explicit design step. Next, a performance analysis for
alternative application instances, architecture instances and mappings
has to be done, thereby exploring the design space of the target
system. Deriving alternative application instances is not trivially
done. Nevertheless, many instances of a single application exist that
are worth to be derived for exploration. In this paper, we present
algorithmic transformation techniques for systematic and fast
generation of alternative application instances that express
task-level concurrency hidden in an application in some degree of
explicitness. These techniques help a system designer to speedup
significantly the design space exploration process.
``A Methodology to Design Programmable Embedded Systems'',
in the LNCS series of Springer-Verlag (c), Volume 2268, SAMOS: Systems, Architectures, Modeling, and Simulation, editors Ed F. Deprettere, Jurgen Teich, and Stamatis Vassiliadis, November 2001.
Embedded systems architectures are increasingly becoming programmable, which means that an architecture can execute a set of applications instead of only one. This makes these systems cost-effective, as the same resources can be reused for another application by reprogramming the system. To design these programmable architectures, we present in this article a number of concepts of which one is the Y-chart approach. These concepts allow designers to perform a systematic exploration of the design space of architectures. Since this design space may be huge, it is narrowed down in a number of steps. The concepts presented in this article provide a methodology in which architectures can be obtained that satisfies a set of constraints while establishing enough flexibility to support a given set of applications.
Tim Harriss, Richard Walke, Bart
Kienhuis and Ed. F. Deprettere
``Compilation from Matlab to Process Networks Realized in FPGA'',
Presented
at the 35th Asilomar conference on Signals, Systems, and
Computers, November 4 -- 7, 2001, Pacific Grove, CA,
USA. QinetiQ, Ltd (c) Compaan is a software tool
capable of automatically translating nested loop programs, written in
Matlab, into parallel Kahn process network descriptions suitable for
implementation in hardware. In this paper we present a tool for
converting these process networks into FPGA implementations. The QR
decomposition algorithm is used to demonstrate the capability of the
tool to quickly generate high-performance parallel
implementations. This allows us to rapidly explore a range of
transformations, such as loop unrolling and skewing, to generate a
circuit that meets the requirements of a particular application. We
present results showing how the control logic complexity and number of
clock cycles vary with these transformations
Bart Kienhuis and Ed. F. Deprettere
``Modeling Stream-Based Applications using the SBF model of computation'',
Presented at: IEEE Workshop on Signal Processing Systems (SIPS 2001), Antwerp, Belgium, September 26-28, 2001
Modeling applications and architectures at various levels of abstraction is becoming more and more an accepted approach in embedded system design. When looking at the modeling of applications in the domain of video, audio, and graphics applications, we notice that they exhibit a high degree of task parallelism and operate on streams of data. Models that we can use to specify such stream-based applications on a high level of abstraction are the dataflow models and process network models. Each of these models has its own merits. Therefore, an alternative approach is to introduce a model of computation that combines the semantics of both models of computation. In this paper, we introduce such a model of computation, which we call the Stream-Based Functions (SBF) model of computation and show an example. Furthermore, we discuss the composition and decomposition of SBF objects and put the SBF model of computation in the context of relevant dataflow models and process network models.
Edwin Rijpkema, Ed F. Deprettere and
Bart Kienhuis
``Deriving
Process Networks From Nested Loop Algorithms'',
Parallel Processing Letters, Vol 10 Nos. 2 & 3 (2000), Pages 165 -- 176.
High level modeling and (quantitative) performance analysis of signal processing systems requires high level models for the applications
(algorithms) and the implementations (architectures), a mapping of the
former into the latter, and a simulator for fast execution of the
whole. Signal processing algorithms are very often nested-loop
algorithms with a high degree of inherent parallelism. This paper
presents - for such applications - a suitable application model and a
method to convert a given imperative executable specification to a
specification in terms of this application model. The methods and
tools are illustrated by means of an example.
Ed F. Deprettere, Edwin Rijpkema,
Paul Lieverse and Bart Kienhuis
``High
Level Modeling for Parallel Executions of Nested Loop
Algorithms'',
IEEE International Conference on
Application-specific Systems, Architectures and Processors
(ASAP'2000), July 10 -- 12 2000, Boston Massachusetts, USA. High
level modeling and (quantitative) performance analysis of signal
processing systems requires high level models for the applications
(algorithms) and the implementations (architecture), a mapping of the
former into the latter, and a simulator for fast execution of the
whole. Signal processing algorithms are very often nested-loop
algorithms with a high degree of inherent parallelism. This paper
presents - for such applications - suitable application and
implemetation models, a method to convert a given imperative
executable specification to a specification in terms of the
application model, a method to map this specification into an
architecture specification in terms of the implementation model, and a
method to analyze the performance through simulation. The methods and
tools are illustrated by means of an example.
``Compaan:
Deriving Process Networks from Matlab for Embedded Signal Processing
Architectures.'',
8th International Workshop on Hardware/Software Codesign (CODES'2000), May 3 -- 5 2000, San Diego, CA, USA.
This paper presents the Compaan tool that automatically transforms a nested loop program written in Matlab into a process network specification. The process network model of computation fits better with the new emerging kind of embedded architectures that use coprocessors. Process networks can describe both fine-grained and coarse-grained parallelism, making the mapping of the applications easier.
``MatParser: An array dataflow analysis compiler.'',
Technical Report UCB/ERL M00/9.
This paper presents MatParser, which is an array
analysis compiler that automatically converts an affine nested loop
program into a single assignment program. The nested loop programs may
contain non-linear operators like div/mod/floor/ceil and stepsizes
other than one. The focus of this article is on the software
architecture used in MatParser to resolve the data
dependencies. Finding that two variables are dependent on each other
and at which iteration, is a very computational intensive
procedure. MatParser employs a particular linear programming technique
to find the data-dependencies, based on parametric integer programming
(PIP) as proposed by Paul Feautrier. To appreciate the implementation
of MatParser, we will explain in this paper in enough detail the
basics of the linear programming technique used.
``Application
and Architecture Modeling for Parallel Execution of Jacobi-type
Algorithms'',
33rd Asilomar conference on signals, systems,
and computers, October 24 -- 27, Pacific Grove, CA.In wireless
communications of tomorrow, high-resolution space-time filtering
techniques will play a crucial role. To make these techniques broadly
available, low-power, low-cost embedded processors for efficient
implementation must be designed. As many of these space-time filters
implement routines that are built on robust Jacobi-type algorithms, it
makes sense to design a processor with a good performance-cost ratio
over a subset of the Jacobi-type algorithms. This paper summarizes
the steps in a design procedure that takes as its input executable
specifications in Matlab of a subset of Jacobi-type algorithms and
outputs one or more candidate processors for efficient execution of
these algorithms. The procedure heavily relies on models of
computation and models of realization to specify algorithm and
architecture instances for evaluation of performance and cost metrics
through extensive simulation. Details of the various steps can be
found in the list of references.
``Compilation
from Matlab to Process Networks'',
Presented at the Second
International Workshop on Compiler and Architecture Support for
Embedded Systems (CASES'99), October 1-3 1999, Washington.Novel high-performance,
domain specific, embedded architectures are more and more composed of
a microprocessor, some memory, and a number of dedicated
coprocessors. Onto these embedded architectures, applications will be
executed that belong to the domain of multi-media processing, mobile
communication, and adaptive array processing. In general, these
applications are written using an imperative model of computation most
commonly C or Matlab. A better specification format would be to use an
inherent parallel model of computation like Process Networks. This
paper presents a three-step approach to automatically transform a
class of Matlab specification into a process network specification.
A.C.J. Kienhuis,
``Design
Space Exploration of Stream-based Dataflow Architectures: Methods and Tools'',
PhD
thesis, Delft University of Technology, The Netherlands, January 1999.
(Explains the Y-chart approach in great detail.)
``The
Construction of a Retargetable Simulator for an Architecture Template'',
In Proc. 6-th Int. Workshop on Hardware/Software
Codesign (CODES'98), Seattle, Washington, March 15 - 18 1998.
Systems in the domain of high-performance
video signal processing are becoming more and more programmable. We suggest
an approach to design such systems that involves measuring, via simulation,
the performance of various architectures on which a set of applications
are mapped. This approach requires a retargetable simulator for an architecture
template. We describe the retargetable simulator that we constructed for
a stream-oriented application-specific dataflow architecture. For each
architecture instance of the architecture template, a specific simulator
is derived in three steps: the architecture instance is constructed, an
execution model is added, and the executable architecture is instrumented
to obtain performance numbers. We used object oriented principles together
with a high-level simulation mechanism to ensure retargetability and an
efficient simulation speed. Finally, we explain how a retargetable simulator
can be encapsulated within an environment for automated design space exploration.
keywords: Quantitative
Analysis, Architecture Templates, Dataflow Architectures, Retargetable
Simulator, Y-chart, Method
P. Lieverse, E.F. Deprettere, A.C.J. Kienhuis
and E.A. de Kock,
``A
Clustering Approach to Explore Grain-sizes in the Definition of Weakly
Programmable Processing Elements'',
In 1997 IEEE Workshop on Signal Processing
Systems: Design and Implementation, pp. 107-120, De Montfort University,
Leicester, UK, November 3-5 1997.
Rewarded with Best Student Paper Award
We explore the efficiency of
a class of stream-oriented dataflow architectures as a function of the
grain-size, for a given set of applications. We believe the grain-size
is a key parameter in balancing flexibility and efficiency of this class
of architectures. We apply a clustering approach on the well-defined set
of applications to determine a set of weakly programmable processing elements
of different grain-sizes. The resulting architectures are compared with
respect to their silicon area. For a set of twenty-one industrially relevant
video algorithms, we determined architectures with various grain- sizes.
The results of this exercise indicate an improvement factor of two for
the silicon area, while changing the grain-size from fine-grain to coarser-
grain.
B. Kienhuis, E. Deprettere, K. Vissers
and P. van der Wolf,
``An
Approach for Quantitative Analysis of Application-Specific Dataflow Architectures'',
In
Proc. 11-th Int. Conf. on Application-specific
Systems, Architectures and Processors, Zurich, Switzerland, July 14-16
1997.
In this paper we present an
approach for quantitative analysis of application- specific dataflow architectures.
The approach allows the designer to rate design alternatives in a quantitative
way and therefore supports him in the design process to find better performing
architectures. The context of our work is Video Signal Processing algorithms
which are mapped onto weakly- programmable, coarse-grain dataflow architectures.
The algorithms are represented as Kahn graphs with the functionality of
the nodes being coarse- grain functions. We have implemented an architecture
simulation environment that permits the definition of dataflow architectures
as a composition of architecture elements, such as functional units, buffer
elements and communication structures. The abstract, clock-cycle accurate
simulator has been built using a multi-threading package and employs object
oriented principles. This results in a configurable and efficient simulator.
Algorithms can subsequently be executed on the architecture model producing
quantitative information for selected performance metrics. Results are
presented for the simulation of a realistic application on several dataflow
architecture alternatives, showing that many different architectures can
be simulated in modest time on a modern workstation.
Keywords: Quantitative
Analysis, Architecture Design, Dataflow Architectures, Simulation.
P.C. Held and
A.C.J. Kienhuis
``DIV,
FLOOR, CEIL, MOD and STEP Functions in Nested Loop Programs and
Linearly Bounded Lattices'',
Algorithms and Parallel VLSI
Architectures III, M. Moonen and F. Catthoor (Editors), Page 271 --
282, Elsevier Science B.V. 1995. This paper describes the
conversion of nested loop programs into single assignment forms. The
nested loop programs may contain the integer operators: integer
division, floor, ceil, and modulo, in
expressions and the stride, or step size, of for loops may be greater
than one. The programs may be parametrized but must have static
control. The conversion is done completely automatical by the tool
HiPars. The output of HiPars is a single assignment
program (SAP) and a dependence graph (DG). The description of the
dependence graph is based on linearly bounded lattices. We will show
the relation between the integer division operators in the SAP and
linearly bounded lattices in the corresponding DG.
NOTE: HiPars is
replaced recently by MatParser
``Functional
Design of Data-Flow Networks'',
PhD thesis, Delft University of
Technology, The Netherlands, May 1996.
Added
reference. This thesis describes in detail some steps as done in the
Compaan
effort.
``Parallelizing
Nested Loop Programs containing DIV, FLOOR, CEIL, MOD and STEP
functions'',
Master thesis, Delft University of Technology, The
Netherlands, Technical Report nr: 94-132, March 1994.
Previous Research Work:
University of Califonia
at Berkeley
Department of Electrical
Engineering and Computer Sciences
558 Cory Hall # 1774
Berkeley CA 94720 - 1774