Bart Kienhuis


Vital Statistics

Name: Dr.ir. Bart (A.C.J.) Kienhuis
Position: Assistant Professor at LIACS, University Leiden, The Netherlands
Group: Computer Systems Group
Section: Embedded Design

Area of expertise:
  • Embedded System Design, in particular compiler support for Embedded Multi Core Systems
  • Software Engineering
  • Web based Software Technlogies
Main Research Field:
  • Software Engineering
  • Tools for Embedded Multi Core System Design
  • Design Space Exploration Support

Office: Room 117
Phone: +31-(0)71 527 6998

Fax: +31-(0)71 527 6985
Email: kienhuis@liacs.nl

 
Work address:

Leiden University
LIACS (Leiden Institute of Advanced Computer Science)
Niels Bohrweg 1
2333 CA Leiden
The Netherlands
+31-(0)71 527 6998


Bio

Bart Kienhuis received a MSEE from Delft University of Technology in 1994 and he received his Ph.D. from Delft University of Technology in 1999. During his Ph.D., he has worked at Philips Research in Eindhoven on a design methodology (the Y-chart approach) for high performance video architectures for consumer products. His primary interest is in the area of embedded system design with an emphasis on design space exploration, performance modeling, architectural analysis, and hardware/software codesign. From 1999 until 2000, Bart Kienhuis was a Post Doc in the group of Prof. Edward A. Lee at the University of California at Berkeley. He is currently an assistant professor in the Computer Systems group.


Involved Research Projects (Dutch/European)


Courses

If you are interested in doing a Bachelor Project (11 ECTS), Software Project (20 ECTS), Research Project (17 ECTS) or Master Project (42 ECTS) within the Computer Systems group, have a look at our student projects page where a list of projects is given.


Presentations:


Publications:

Most of these articles may be copyright of ACM or IEEE. Please understand their copyright policy before reproducing these articles.

Finished Master Project Publications: Publications.

Sjoerd Meijer, Bart Kienhuis, Johan Walters, and David Snuijf
``Automatic partitioning and mapping of stream-based applications onto the Intel IXP Network Processor'', Workshop on Software & Compilers for Embedded Systems (SCOPES'07), Nice, April 20 2007

When studying the IXP Network processor architecture from Intel, we found quite some interesting aspects that make the IXP attractive for stream-based applications. The architecture is highly optimized for streaming data, albeit in the form of internet packets. Furthermore, the architecture has Gigabit Ethernet connectors for handling incoming and outgoing traffic and can process this data at real-time using dedicated microengines. In this paper, we try to answer three questions; 1) Can we use the IXP architecture for stream-based applications? 2) can we map applications written as a KPN onto an IXP? 3) can we integrate the generation of KPNs using the Compaan compiler with a tool flow that maps the KPN to an IXP, thereby make the programming of IXP much simpler? As will be shown, all three steps can be performed and we show that we can map automatically two non-internet stream-based applications (QR and DWT) onto the IXP.

Sjoerd Meijer, Bart Kienhuis, Alex Turjan and Erwin de Kock
``A Generic Process Splitting Transformation For Kahn Process Networks'', Proceedings of the DATE conference (DATE'07), Nice, April 17 - 19 2007

Kahn Process Networks (KPN) are very suitable for systematic mapping onto multiprocessor architectures. Running applications written in this parallel program specification on a multiprocessor architecture, however, does not guarantee that the runtime requirements are met. Therefore, it may be necessary to further analyze and optimize Kahn process networks. In this paper, we will present a process splitting technique that results in a functionally equivalent process network, but with a changed and optimized network structure. The class of networks that can be handled is generic as it is not restricted to static networks. The presented approach can also handle processes with dynamic program statements. We prototyped the splitting transformation in the GCC compiler for a JPEG decoder application. This resulted in a 21% performance improvement.

Ming-Yung Ko, Claudiu Zissulescu, Sebastian Puthenpurayil, Shuvra Bhattacharyya, Bart Kienhuis, and Ed Deprettere,
``Parameterized Looped Schedules for Compact Representation of Execution Sequences in DSP Hardware and Software Implementation'', Accepted for publication (2006) in the IEEE Transactions on Signal Processing. Describes joined work between LIACS and Dept of Electrical Engineering, Univ. of Maryland

In this paper, we present a technique for compact representation of execution sequences in terms of efficient looping constructs. Here, by a looping construct, we mean a compact way of specifying a finite repetition of a set of execution primitives. Such compaction, which can be viewed as a form of hierarchical runlength encoding (RLE), has application in many VLSI signal processing contexts, including efficient control generation for Kahn processes on FPGAs, and software synthesis for static dataflow models of computation. In this paper, we significantly generalize previous models for loop-based code compaction of DSP programs to yield a configurable code compression methodology that exhibits a broad range of achievable trade-offs. Specifically, we formally develop and apply to DSP hardware and software synthesis a parameterizable loop scheduling approach with compact format, dynamic reconfigurability, and low-overhead decompression.

Alexandru Turjan, Bart Kienhuis and Ed Deprettere,
``Classifying Interprocess Communication in Process Network Representation of Nested Loop Programs'', Accepted for publication (2006) in the ACM Transactions on Embedded Computing Systems.

Claudiu Zissulescu, Bart Kienhuis and Ed Deprettere,
[PDF]``Communication Synthesis in a multiprocessor environment'', Presented at Field-Programmable Logic and Applications conference (FPL'05) Tampere, Finland August 24 - 26, 2005

At Leiden University, we are developing a design methodology that allows for fast mapping of nested-loop applications (e.g. DSP, Imaging, or Multi-Media) written in a subset of Matlab onto reconfigurable devices. This design methodology is implemented into a tool chain that we call Compaan/Laura. This methodology generates a process network in which the inter-process communication takes place in a point-to-point fashion. Four types of point-to-point inter-processor communication exist in the PN. Two of them use a FIFO like communication and the other two use a cache like memory to exchange data. In this paper, we investigate the realizations for the four communication types and show that point-to-point communication at the level of scalars can be realized automatically and very efficiently in today's FPGAs.

Claudiu Zissulescu, Bart Kienhuis and Ed Deprettere,
[PDF]``Expression Synthesis in Process Networks generated by Laura'', Presented at the 16th IEEE International Conference on Application-specific Systems, Architectures and Processors ASAP2005, July 23 -- 25, 2005, Samos, Greece

The Compaan/Laura tool chain maps nested loop applications written in Matlab onto reconfigurable platforms, such as FPGAs. \Compaan rewrites the original Matlab application as a Process Network in which the control is parameterized and distributed. This control is given as parameterized polytopes that are expressed in terms of pseudo-linear expressions. These expressions cannot always be mapped efficiently onto hardware as they contain multiplication and integer division operations. This obstructs the data flow through the processes. Therefore, we present in this paper the Expression Compiler that efficiently maps pseudo-linear expressions onto reconfigurable hardware in such a way that the distributed and parameterized control never obstructs the data flow through processors. This compiler employs techniques like number theory axioms, method of difference, and predicated static single assignment code.

Alexandru Turjan, Bart Kienhuis and Ed Deprettere,
[HTML] ``Solving Out-of-Order Communication in Kahn Process Networks'',
in the Journal of VLSI Signal Processing, Issue: Volume 40, Number 1 Date: May 2005 Pages: 7 - 18

The Compaan compiler framework automates the transformation of DSP applications written in Matlab into Kahn Process Networks (KPNs). These KPNs express the signal processing applications in a parallel distributed way making them more suitable for mapping onto parallel architectures. A simple instance of a generated KPN by Compaan is a Producer process that communicates with a Consumer process via a FIFO buffer, with the Consumer reading data from the FIFO using a blocking read. When the sequence of producing data is different from the sequence of consuming data, a simple FIFO is not sufficient to implement the communication. For such case, extra storage and control are needed at the consumer side. This paper presents a compile approach that determines whether a FIFO buffer is sufficient for every Producer/Consumer pair of a Compaan-generated KPN. When additional memory is required, we provide an address generation mechanism to perform the reordering and furthermore give a lower bound on the amount of memory needed to perform the reordering. The presented approach is based on the Ehrhart theory.

Andre van der Plas, Bart Kienhuis, ``A Methodology for Efficient Reuse of Open Source Code'',
presented the First Conference for the dutch Software Engineering Community (JACQUARD2005)

Research has been carried out to explore if, and how, it is possible to increase the efficiency in which open source code can be used. The research has shown that structuring such code can improve this efficiency. It has lead to a methodology and a set of techniques, which can be used to (re) structure the code.

Steven Derrien, Alexandru Turjan, Claudiu Zissulescu, Bart Kienhuis and Ed Deprettere, ``Deriving Efficient Control in Process Networks with Compaan/laura'', Published in the International Journal of Embedded Systems, volume 1, Issue 7, 2005 Embedded-Journal Inderscience (IJES)

At Leiden Embedded Research Center (LERC), we are building a tool chain called Compaan/Laura that allows us to map rapidly and efficiently signal processing applications written in Matlab onto reconfigurable platforms. In this chain, first the Matlab code is converted automatically to executable Kahn Process Network (KPN) specification, then a tool called Laura transforms the PN specification into design implementations described as synthesizable VHDL.
The applications targeted by Compaan are usually data-flow intensive, requiring large computational power. Therefore, an important issue in Laura is the derivation of efficient and scalable hardware control structures. This control is based on an abstract representation given as parametrized polytopes. Although this representation can be directly translated into nested guarded for-loops (very suitable for software implementation) its mapping to hardware is much more difficult. In this paper we investigate the opportunities of deriving different hardware realizations for this control, and explore the trade off between speed and resources usage.

Alexandru Turjan, Bart Kienhuis and Ed Deprettere,
[PDF]``A Hierarchical Classification Scheme to Derive Interprocess Communication in Process Networks'', in the proceedings of the 15th IEEE International Conference on Application-specific Systems, Architectures and Processors ASAP2004, Sept 27 -- 29, 2004, Galveston, Texas

The Compaan compiler automatically derives a Process Network (PN) description from an application written in Matlab. The basic element of a PN is a Producer/Consumer (P/C) pair. Four different communication patterns for a P/C pair have been identified and the complexity of communication structure differs depending on the communication pattern involved. Therefore, in order to obtain cost-efficient process networks our compiler automatically identifies the communication pattern of each P/C pair. This problem is equivalent to integer linear programming and thus in general can not be solved efficiently. In this paper we present simpler techniques that allow to classify the interprocess communication in a PN. However, in some cases those techniques do not allow to find an answer and therefore, an ILP test has still to be applied. Thus, we introduce a hierarchical classification scheme that correctly classifies the interprocess communication, but uses dramatically less integer linear programming. In only 5% of the cases to classify, we still relay on integer linear programming while in the remaining 95%, the techniques presented in this paper are able to classify a case correctly.

Alexandru Turjan, Bart Kienhuis and Ed Deprettere,
[PDF] ``Translating affine nested-loop programs to Process Networks'', in proceedings of at the international conference on compilers, architecture, and synthesis for Embedded Systems (CASES'04), Sept 23 -- 25, 2004, Washington D.C.

New heterogeneous multiprocessor platforms are emerging that are typically composed of loosely coupled components that exchange data using programmable interconnections. The components can be CPUs or DSPs, specialized IP cores, reconfigurable units, or memories. To program such platform, we use the Process Network (PN) model of computation. The localized control and distributed memory are the two key ingredients of a PN allowing us to program the platforms. The localized control matches the loosely coupled components and the distributed memory matches the style of interaction between the components. To obtain applications in a PN format, we have built the Compaan compiler that translates affine nested-loop programs into functionally equivalent PNs. In this paper, we describe a novel analytical translation procedure we use in our compiler that is based on integer linear programming. The translation procedure consists of four main steps and we will present each step by describing the main idea involved, followed by a representative example.

Claudiu Zissulescu, Bart Kienhuis and Ed Deprettere,
[PDF] ``Increasing pipelined IP core utilization in Process Networks using Exploration'', In proceedings of Field-Programmable Logic and Applications (FPL'04), Springer LNCS 3203, pages 690 -- 699

At Leiden Embedded Research Center, we are building a tool chain called Compaan/Laura that allows us to do fast mapping of applications written in Matlab onto reconfigurable platforms, such as FPGAs, using IP cores to implement the data-path of the applications. A particular characteristic of the derived networks is the existence of selfloops. These selfloops have a large impact on the utilization of IP cores in the final hardware implementation of a PN, especially if the IP cores are deeply pipelined. In this paper, we present an exploration methodology that uses feedback provided by the Laura tool to increase the utilization of IP cores embedded in our PN network. Using this exploration, we go from 60MFlops to 1,7GFlops for the QR algorithm using the same number of resources except for memory.

Alexandru Turjan, Bart Kienhuis and Ed Deprettere,
[PDF] ``An Approach to Classify Inter-Process Communication in Process Networks at Compile Time'', In proceedings of 8th International Workshop on Software and Compilers for Embedded Systems (SCOPES2004), Springer LNCS 3199, pages 62 -- 76, Sept 2 -- 3, 2004, Amsterdam

New embedded signal processing architectures are emerging that are composed of loosely coupled heterogeneous components like CPUs or DSPs, specialized IP cores, reconfigurable units, or memories. We believe that these architectures should be programmed using the Process Network model of computation. To ease the mapping of applications, we are developing the Compaan compiler that automatically derives a Process Network (PN) description from an application written in Matlab. In this paper, we investigate a particular problem in nested loop programs, which is about classifying the interprocess communication in the PN representation of the nested loop program. The global memory arrays present in the Matlab code have to be replaced by a distributed communication structure used for sending data to the network processes. We will show that four types of communication exists, each exhibiting different requirements when realizing them in hardware of software. We present two compile time tests that decide the type of a particular static array. These tests have become an important part of our Compaan compiler.

Ingrid Verbauwhede (UCLA/KU Leuven), Patrick Schaumont (UCLA), Christian Piquet (CSEM), and Bart Kienhuis,
[PDF] ``Architectures and Design techniques for energy efficient embedded DSp and multimedia processing'' presented at the embedded Low-power Tutorial at the Design, Automation and Test in Europe conference DATE2004, Feb 16-20 2004, Paris, France

Energy efficient embedded systems consist of a heterogeneous collection of very specific building blocks, connected together by a complex network of many dedicated busses and interconnect options. The trend to merge multiple functions into one device makes the design and integration of these "systems-on-chip" (SOC's) even more problematic. Yet, specifications and applications are never fixed and require the embedded units to be programmable. The topic of this paper is to give the designer architectures and design techniques to find the right balance between energy efficiency and flexibility. The key is to include programmability (or reconfiguration) at the right level of abstraction and tuned to the application domain. The challenge is to provide an exploration and programming environment for this heterogeneous architecture platform.

Todor Stefanov, Claudiu Zissulescu, Alexandru Turjan, Bart Kienhuis, Ed Deprettere,
[PDF] ``System Design using Kahn Process Networks: The Compaan/Laura Approach'', in proceedings of the Design, Automation and Test in Europe conference DATE2004, Feb 16-20 2004, Paris, France

New emerging embedded system platforms in the realm of high-throughput multimedia, imaging, and signal processing will consist of multiple microprocessors and reconfigurable components. One of the major problems is how to program these platforms in a systematic and automated way so as to satisfy the performance need of applications executed on these platforms.In this paper, we present our system design approach as an efficient solution to this programming problem. We show how for an application written in Matlab, a Kahn Process Network specification can automatically be derived and systematically mapped onto a target platform composed of a microprocessor and an FPGA. Furthermore, we illustrate how the mapping approach is applied on a real-life example, namely an M-JPEG encoder.

Claudiu Zissulescu, Todor Stefanov, Bart Kienhuis, Ed Deprettere,
[PDF],``Laura: Leiden Architecture Research and Exploration Tool'',
Presented at the International Conference on Field Programmable Logic and Applications (FPL), Sept 1-3 2003, Lisbon, Portugal

At Leiden Embedded Research Center (LERC), we are building a tool chain called Compaan/Laura that allows us to map fast and efficiently application written in Matlab onto reconfigurable platforms. In this chain, first the Matlab code is converted automatically to executable Kahn Process Network (KPN) specification. Then a tool called Laura accepts this specification and transforms the specification into design implementations described as synthesizable VHDL. In this paper, we present our methodology implemented in Laura tool to automatically convert KPNs to synthesizable VHDL code targeted form mapping onto FPGA-based platforms. With the help of Laura, a designer is able to either fast prototype signal processing and multimedia applications directly in hardware or to extract very fast valuable low-level quantitative implementation data such as performance in terms of clock cycles or silicon area.

Alexandru Turjan, Bart Kienhuis
[PDF],``Storage Management in Process Networks using the Lexicographically Maximal Preimage'',
Presented at the IEEE 14th International Conference on Application-specific Systems, Architectures and Processors (ASAP'03) June 24-26, 2003, Den Hague, The Netherlands

At the Leiden Embedded Research Center, we are developing a compiler called Compaan that automatically translates signal processing applications written in Matlab into Kahn Process Networks (KPNs). In general, these signal processing applications are data-flow intensive, requiring large storage capacities, usually represented by matrices. An important issue in Compaan is the derivation of a memory management mechanism that allows for efficient inter-process communication. This mechanism has previously been published and is called the Extended Linearization Model (ELM). The controller needed in the ELM is derived using the Ehrhart theory, leading to a computational intensive procedure. In this paper, we present a new approach to derive the ELM controller, based on the notion of Lexicographically Maximal Preimage. Using polytope manipulations and parametric integer linear programming techniques, we get less computational intensive and easier to be derived controller implementation for the ELM.

Alexandru Turjan, Bart Kienhuis, Ed Deprettere,
``Realizations of the Extended Linearization Model'',
In chapter 9 of book "Domain-Specific Embedded Multiprocessors", page 171 -- 191, 2003, by Marcel Dekker, Inc

At the Leiden Embedded Research Center, we are working towards a framework called Compaan that automates the transformation of digital signal processing (DSP) applications to Kahn Process Networks (KPNs). These applications are written in Matlab as parameterized nested loop programs. This transformation is interesting as KPNs are well suited for mapping onto parallel architectures. Although the KPN semantic always assumes that FIFO buffers can be used between processes, we have found cases in which the FIFO is not enough as data may arrive in the wrong order. To solve this order problem, we previously presented the Extended Linearization Model (ELM) that describes a mechanism to reorder tokens. The introduction of the ELM does not violate the Kahn Process Network semantics; we still use a FIFO between a Producer and Consumer. The ELM relays on some additional memory and a controller to perform the reordering. The ELM model can be implemented in different ways. In this chapter, we investigate four different realizations of the ELM. The realizations differ in the computational complexity of performing the reordering, the kind of reordering memory used, and the size of the reordering memory.

Bart Kienhuis and Ed F. Deprettere,
[PDF]``Modeling Stream-Based Applications Using the SBF Model of Computation'',
The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology (Kluwer), pag 291 -- 300, Vol. 34, Issue 3, 2003.

Edwin Rijpkema (Promotor Ed Deprettere and Co-promotor Bart Kienhuis),
[PDF]``Modeling Task Level Parallelism in Piece-wise Regular Programs'',
PhD thesis, Leiden University, Leiden Institute of Advanced Computer Science (LIACS), The Netherlands, Sept 2002.

Tim Harriss, Richard Walke, Bart Kienhuis, and Ed Deprettere
[PDF]``Compilation from Matlab to Process Networks Realized in FPGA'', In journal on Design Automation of Embedded Systems, Kluwer, Vol 7, Issue 4, 2002

Compaan is a software tool that is capable of automatically translating nested loop programs, written in Matlab, into parallel process network descriptions suitable for implementation in hardware. In this article, we show a methodology and tool to convert these process networks into FPGA implementations. We will show that we can in principle obtain high performing realizations in a fraction of the design time currently employed to realize a parameterized implementation. This allows us to rapidly explore a range of transformations, such as loop unrolling and skewing, to generate a circuit that meets the requirements of a particular application. The QR decomposition algorithm is used to demonstrate the capability of the tool. We present results showing how the number of clock cycles and calculations-per-second vary with these transformations using a simple implementation of the function units. We also provide an indication of what we expect to achieve in the near future once the tools are completed and applied the transformations to parallel, highly pipelined implementations of the function units.

Keywords: Process Networks, Stream-based Model of computation, Nested Loop Programs, Matlab, FPGA, Skewing, Unrolling.

Alexandru Turjan, Bart Kienhuis, and Ed Deprettere
[PDF]``A compile time based approach for solving out-of-order communication in Kahn Process Networks'', in proceeding of IEEE 13th International Conference on Aplication-specific Systems, Architectures and Processors (ASAP'2002), San Jose, CA, USA, July 17-19, 2002

The Compaan compiler framework automates the transformation of DSP applications written in Matlab into Kahn Process Networks (KPNs). These KPNs express the signal processing applications in a parallel distributed way making them more suitable for mapping onto parallel architectures. A simple instance of a generated KPN by Compaan is a Producer process that communicates with a Consumer process via a FIFO buffer, with the Consumer reading data from the FIFO using a blocking read. When the sequence of producing data is different from the sequence of consuming data, a simple FIFO is not sufficient to implement the communication. For such case, extra storage and control are needed at the consumer side. This paper presents a novel approach that determines at compile time whether a FIFO buffer is sufficient for every Producer/Consumer pair of a Compaan-generated KPN. For the case when the additional memory is required, we also provide an address generation mechanism at compile time. The presented approach is based on the Ehrhart theory.

Todor Stefanov, Bart Kienhuis, and Ed Deprettere
[PDF]``Algorithmic Transformation Techniques for Efficient Exploration of Alternative Application Instances'', in proceeding of Tenth International Symposium on Hardware/Software Codesign CODES'2002, Stanley Hotel, Estes Park, Colorado, USA, May 6 -- 8, 2002

Following the Y-chart paradigm for designing a system, an application and an architecture are modeled separately and mapped onto each other in an explicit design step. Next, a performance analysis for alternative application instances, architecture instances and mappings has to be done, thereby exploring the design space of the target system. Deriving alternative application instances is not trivially done. Nevertheless, many instances of a single application exist that are worth to be derived for exploration. In this paper, we present algorithmic transformation techniques for systematic and fast generation of alternative application instances that express task-level concurrency hidden in an application in some degree of explicitness. These techniques help a system designer to speedup significantly the design space exploration process.

Bart Kienhuis, Ed Deprettere, Pieter van der Wolf and Kees Vissers
[PDF]``A Methodology to Design Programmable Embedded Systems'',
in the LNCS series of Springer-Verlag (c), Volume 2268, SAMOS: Systems, Architectures, Modeling, and Simulation, editors Ed F. Deprettere, Jurgen Teich, and Stamatis Vassiliadis, November 2001.

Embedded systems architectures are increasingly becoming programmable, which means that an architecture can execute a set of applications instead of only one. This makes these systems cost-effective, as the same resources can be reused for another application by reprogramming the system. To design these programmable architectures, we present in this article a number of concepts of which one is the Y-chart approach. These concepts allow designers to perform a systematic exploration of the design space of architectures. Since this design space may be huge, it is narrowed down in a number of steps. The concepts presented in this article provide a methodology in which architectures can be obtained that satisfies a set of constraints while establishing enough flexibility to support a given set of applications.

keywords: Y-chart approach, Architecture Template, Stack of Y-charts, Design Space Exploration, Abstraction Pyramid, Embedded Systems


Tim Harriss, Richard Walke, Bart Kienhuis and Ed. F. Deprettere
[PDF]``Compilation from Matlab to Process Networks Realized in FPGA'',
Presented at the 35th Asilomar conference on Signals, Systems, and Computers, November 4 -- 7, 2001, Pacific Grove, CA, USA. QinetiQ, Ltd (c)

Compaan is a software tool capable of automatically translating nested loop programs, written in Matlab, into parallel Kahn process network descriptions suitable for implementation in hardware. In this paper we present a tool for converting these process networks into FPGA implementations. The QR decomposition algorithm is used to demonstrate the capability of the tool to quickly generate high-performance parallel implementations. This allows us to rapidly explore a range of transformations, such as loop unrolling and skewing, to generate a circuit that meets the requirements of a particular application. We present results showing how the control logic complexity and number of clock cycles vary with these transformations

Bart Kienhuis and Ed. F. Deprettere
[PDF]``Modeling Stream-Based Applications using the SBF model of computation'',
Presented at: IEEE Workshop on Signal Processing Systems (SIPS 2001), Antwerp, Belgium, September 26-28, 2001

Modeling applications and architectures at various levels of abstraction is becoming more and more an accepted approach in embedded system design. When looking at the modeling of applications in the domain of video, audio, and graphics applications, we notice that they exhibit a high degree of task parallelism and operate on streams of data. Models that we can use to specify such stream-based applications on a high level of abstraction are the dataflow models and process network models. Each of these models has its own merits. Therefore, an alternative approach is to introduce a model of computation that combines the semantics of both models of computation. In this paper, we introduce such a model of computation, which we call the Stream-Based Functions (SBF) model of computation and show an example. Furthermore, we discuss the composition and decomposition of SBF objects and put the SBF model of computation in the context of relevant dataflow models and process network models.

Edwin Rijpkema, Ed F. Deprettere and Bart Kienhuis
[PDF]``Deriving Process Networks From Nested Loop Algorithms'',
Parallel Processing Letters, Vol 10 Nos. 2 & 3 (2000), Pages 165 -- 176.

High level modeling and (quantitative) performance analysis of signal processing systems requires high level models for the applications (algorithms) and the implementations (architectures), a mapping of the former into the latter, and a simulator for fast execution of the whole. Signal processing algorithms are very often nested-loop algorithms with a high degree of inherent parallelism. This paper presents - for such applications - a suitable application model and a method to convert a given imperative executable specification to a specification in terms of this application model. The methods and tools are illustrated by means of an example.

Ed F. Deprettere, Edwin Rijpkema, Paul Lieverse and Bart Kienhuis
[PDF]``High Level Modeling for Parallel Executions of Nested Loop Algorithms'',
IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP'2000), July 10 -- 12 2000, Boston Massachusetts, USA.

High level modeling and (quantitative) performance analysis of signal processing systems requires high level models for the applications (algorithms) and the implementations (architecture), a mapping of the former into the latter, and a simulator for fast execution of the whole. Signal processing algorithms are very often nested-loop algorithms with a high degree of inherent parallelism. This paper presents - for such applications - suitable application and implemetation models, a method to convert a given imperative executable specification to a specification in terms of the application model, a method to map this specification into an architecture specification in terms of the implementation model, and a method to analyze the performance through simulation. The methods and tools are illustrated by means of an example.

Bart Kienhuis, Edwin Rijpkema, and Ed F. Deprettere
[PDF]``Compaan: Deriving Process Networks from Matlab for Embedded Signal Processing Architectures.'',
8th International Workshop on Hardware/Software Codesign (CODES'2000), May 3 -- 5 2000, San Diego, CA, USA.

This paper presents the Compaan tool that automatically transforms a nested loop program written in Matlab into a process network specification. The process network model of computation fits better with the new emerging kind of embedded architectures that use coprocessors. Process networks can describe both fine-grained and coarse-grained parallelism, making the mapping of the applications easier.

Bart Kienhuis
[PDF]``MatParser: An array dataflow analysis compiler.'',
Technical Report UCB/ERL M00/9.

This paper presents MatParser, which is an array analysis compiler that automatically converts an affine nested loop program into a single assignment program. The nested loop programs may contain non-linear operators like div/mod/floor/ceil and stepsizes other than one. The focus of this article is on the software architecture used in MatParser to resolve the data dependencies. Finding that two variables are dependent on each other and at which iteration, is a very computational intensive procedure. MatParser employs a particular linear programming technique to find the data-dependencies, based on parametric integer programming (PIP) as proposed by Paul Feautrier. To appreciate the implementation of MatParser, we will explain in this paper in enough detail the basics of the linear programming technique used.

Ed F. Deprettere, Edwin Rijpkema, and Bart Kienhuis
[PDF]``Application and Architecture Modeling for Parallel Execution of Jacobi-type Algorithms'',
33rd Asilomar conference on signals, systems, and computers, October 24 -- 27, Pacific Grove, CA.

In wireless communications of tomorrow, high-resolution space-time filtering techniques will play a crucial role. To make these techniques broadly available, low-power, low-cost embedded processors for efficient implementation must be designed. As many of these space-time filters implement routines that are built on robust Jacobi-type algorithms, it makes sense to design a processor with a good performance-cost ratio over a subset of the Jacobi-type algorithms. This paper summarizes the steps in a design procedure that takes as its input executable specifications in Matlab of a subset of Jacobi-type algorithms and outputs one or more candidate processors for efficient execution of these algorithms. The procedure heavily relies on models of computation and models of realization to specify algorithm and architecture instances for evaluation of performance and cost metrics through extensive simulation. Details of the various steps can be found in the list of references.

Edwin Rijpkema, Bart Kienhuis and Ed F. Deprettere,
[PDF]``Compilation from Matlab to Process Networks'',
Presented at the Second International Workshop on Compiler and Architecture Support for Embedded Systems (CASES'99), October 1-3 1999, Washington.

Novel high-performance, domain specific, embedded architectures are more and more composed of a microprocessor, some memory, and a number of dedicated coprocessors. Onto these embedded architectures, applications will be executed that belong to the domain of multi-media processing, mobile communication, and adaptive array processing. In general, these applications are written using an imperative model of computation most commonly C or Matlab. A better specification format would be to use an inherent parallel model of computation like Process Networks. This paper presents a three-step approach to automatically transform a class of Matlab specification into a process network specification.

keywords: Process Networks, Matlab, Compilation, Embedded Architectures


A.C.J. Kienhuis,
[PDF]``Design Space Exploration of Stream-based Dataflow Architectures: Methods and Tools'',
PhD thesis, Delft University of Technology, The Netherlands, January 1999.
(Explains the Y-chart approach in great detail.)

 

B. Kienhuis, E. Deprettere, K. Vissers and P. van der Wolf,
[PDF]``The Construction of a Retargetable Simulator for an Architecture Template'',
In Proc. 6-th Int. Workshop on Hardware/Software Codesign (CODES'98), Seattle, Washington, March 15 - 18 1998.

Systems in the domain of high-performance video signal processing are becoming more and more programmable. We suggest an approach to design such systems that involves measuring, via simulation, the performance of various architectures on which a set of applications are mapped. This approach requires a retargetable simulator for an architecture template. We describe the retargetable simulator that we constructed for a stream-oriented application-specific dataflow architecture. For each architecture instance of the architecture template, a specific simulator is derived in three steps: the architecture instance is constructed, an execution model is added, and the executable architecture is instrumented to obtain performance numbers. We used object oriented principles together with a high-level simulation mechanism to ensure retargetability and an efficient simulation speed. Finally, we explain how a retargetable simulator can be encapsulated within an environment for automated design space exploration.
keywords: Quantitative Analysis, Architecture Templates, Dataflow Architectures, Retargetable Simulator, Y-chart, Method
P. Lieverse, E.F. Deprettere, A.C.J. Kienhuis and E.A. de Kock,
[PS]``A Clustering Approach to Explore Grain-sizes in the Definition of Weakly Programmable Processing Elements'',
In 1997 IEEE Workshop on Signal Processing Systems: Design and Implementation, pp. 107-120, De Montfort University, Leicester, UK, November 3-5 1997.
Rewarded with Best Student Paper Award
We explore the efficiency of a class of stream-oriented dataflow architectures as a function of the grain-size, for a given set of applications. We believe the grain-size is a key parameter in balancing flexibility and efficiency of this class of architectures. We apply a clustering approach on the well-defined set of applications to determine a set of weakly programmable processing elements of different grain-sizes. The resulting architectures are compared with respect to their silicon area. For a set of twenty-one industrially relevant video algorithms, we determined architectures with various grain- sizes. The results of this exercise indicate an improvement factor of two for the silicon area, while changing the grain-size from fine-grain to coarser- grain.
B. Kienhuis, E. Deprettere, K. Vissers and P. van der Wolf,
[PDF]``An Approach for Quantitative Analysis of Application-Specific Dataflow Architectures'',
In Proc. 11-th Int. Conf. on Application-specific Systems, Architectures and Processors, Zurich, Switzerland, July 14-16 1997.
In this paper we present an approach for quantitative analysis of application- specific dataflow architectures. The approach allows the designer to rate design alternatives in a quantitative way and therefore supports him in the design process to find better performing architectures. The context of our work is Video Signal Processing algorithms which are mapped onto weakly- programmable, coarse-grain dataflow architectures. The algorithms are represented as Kahn graphs with the functionality of the nodes being coarse- grain functions. We have implemented an architecture simulation environment that permits the definition of dataflow architectures as a composition of architecture elements, such as functional units, buffer elements and communication structures. The abstract, clock-cycle accurate simulator has been built using a multi-threading package and employs object oriented principles. This results in a configurable and efficient simulator. Algorithms can subsequently be executed on the architecture model producing quantitative information for selected performance metrics. Results are presented for the simulation of a realistic application on several dataflow architecture alternatives, showing that many different architectures can be simulated in modest time on a modern workstation.
Keywords: Quantitative Analysis, Architecture Design, Dataflow Architectures, Simulation. P.C. Held and A.C.J. Kienhuis
[PDF]``DIV, FLOOR, CEIL, MOD and STEP Functions in Nested Loop Programs and Linearly Bounded Lattices'',
Algorithms and Parallel VLSI Architectures III, M. Moonen and F. Catthoor (Editors), Page 271 -- 282, Elsevier Science B.V. 1995.

This paper describes the conversion of nested loop programs into single assignment forms. The nested loop programs may contain the integer operators: integer division, floor, ceil, and modulo, in expressions and the stride, or step size, of for loops may be greater than one. The programs may be parametrized but must have static control. The conversion is done completely automatical by the tool HiPars. The output of HiPars is a single assignment program (SAP) and a dependence graph (DG). The description of the dependence graph is based on linearly bounded lattices. We will show the relation between the integer division operators in the SAP and linearly bounded lattices in the corresponding DG.
NOTE: HiPars is replaced recently by MatParser

Peter Held,
[PDF]``Functional Design of Data-Flow Networks'',
PhD thesis, Delft University of Technology, The Netherlands, May 1996.

 Added reference. This thesis describes in detail some steps as done in the Compaan effort.

Bart Kienhuis,
[PDF]``Parallelizing Nested Loop Programs containing DIV, FLOOR, CEIL, MOD and STEP functions'',
Master thesis, Delft University of Technology, The Netherlands, Technical Report nr: 94-132, March 1994.

 


Previous Research Work:

Previos Work Address:

University of Califonia at Berkeley
Department of Electrical Engineering and Computer Sciences
558 Cory Hall # 1774
Berkeley CA 94720 - 1774

Philips Research / TU Delft 1994 - 1998 High Level System Evaluation ( Y-chart approach )
TU Delft 1993 - 1994 The HiPars parallel compiler for Nested Loop Programs