The shrinking line width of semiconductor manufacturing processes has increased the challenges of digital system designers, since the design complexity increases quadratically compared to a linear decrease in the line width. This has motivated research into modular design methods, rapid prototyping and distributed systems.
Most design efforts in communications technology, consumer electronics and medical electronics are dedicated to software development. A side effect of this phenomenom is that the underlying hardware platform may not be optimal for a particular application, since it was originally designed for a different application. Rapidly changing standards and the demand for new applications cannot tolerate the limitations of the conventional fixed and rigid Microprocessor/ASIC design approach.
There is clearly a need for universal platforms, and future electronics must incorporate reprogrammability at the device level. Field Programmable Gate Arrays (FPGAs) are ideal components in this respect, as they can be reprogrammed within a fraction of a second.
Dynamic and partial reprogramming of system components can also increase the performance of electronics systems. This requires advanced design methods, since support for partial reprogrammability has not yet found adequate support in design tools. Furthermore, software-based design methods need to be emphasized also in electronics research.
The Internet has changed the world in many ways during the last decade, but its capabilities as a transport medium in distributed computing have remained mostly untapped. Especially small- to medium-sized Ethernet-based TCP/IP networks are attractive alternatives for connecting together reprogrammable computation acceleration units.
Based on the preceding observations, an Internet-connected distributed FPGA-based cluster (HiPerFPGA) is being proposed as a solution for computationally intensive applications.
FGPAs have traditionally been used as coprocessors to alleviate the load of the host processor by performing dedicated and highly specialized computation tasks. This is called processor off-loading and the term hardware accelerator has been used for the FPGA-based coprocessor.
As additional computation acceleration is required, the limiting factor becomes the amount of available logic elements[1] (LEs) in the FPGA. A traditional solution to overcome limitations in FPGA size has been to use multiple FPGAs in the hardware accelerator. This approach is inflexible, since only the amount of required LEs for a particular application with fixed specifications is available. When another application is designed for the hardware accelerator, the available LEs may no be sufficient, and a design cycle has to be initiated.
When a generic computation platform is required, it is not possible to estimate the needed resources beforehand. Using the same idea as in networked computer clusters, it turns out that a flexibly expandable networked FPGA cluster with dynamic resource allocation solves the problems inherent in fixed-sized FPGA-based calculation systems.
The proposed FPGA cluster is called HiPerFPGA (High-Performance FPGA) and contains several FPGA-based computation units, called FPGA Nodes, each with a 100Mbps Ethernet network adapter. The FPGA Nodes are fully connected to each other through an Ethernet switch. Since Ethernet is a core technology in modern Internet, it is also a cost-effective solution for high-speed communication. Integration to other systems is also straightforward. Furthermore, the upper limit of fully FPGA Nodes units is for all practical purposes infinite.
The FPGA Nodes provide all the computation power of the HiPerFPGA. The Master Node computer handles resource allocation and load balancing. The Master Node also supplies the computation data for all FPGA Nodes and collects the results when the computation cycle is finished. To avoid bottlenecks in multiple accesses from the Master Node to the FPGA Nodes, a 1 Gbps Ethernet interface is to be used. Since communication between the Master Node and the FPGA Nodes is synchronized, the 1 Gbps connection can be fully utilized with 10 simultaneous 100 Mbps connecetions.
Distributed FPGA designs are not currently supported by commercial design tools, and custom-made design tools and plugins for development platforms are needed. A lot of work needs to be done to overcome this limitation, but the goal is to hide the underlying achitecture and make the design flow similar to a single chip design. The HiPerFPGA can be viewed as a single scalable very high capacity FPGA from the designer’s point of view. This is a key factor in making the design flow for massively parallel systems a reasonable task.
The HiPerFPGA is partially reprogrammable by reprogramming a single or more FPGA Nodes. In other words, the HiPerFPGA has the ability to change a portion of the design whilst the remainder of the cluster continues to operate. This is an extension of the more traditional SoC (System on a chip) technology, where spatial resource binding takes place at design time.
An FPGA Node includes one or two FPGAs, an Ethernet interface, and memory. A generic card should have the fastest and the largest chips available. This way, algorithm development is not limited by hardware. Although memory in an FPGA Node is not necessary in every application, a generic card needs memory.
Since the goal is to have computing power through size and as a Printed Circuit Board (PCB) cannot be too big for manufacturing reasons, there is additional motivation for clustering. When the HiPerFPGA is connected to the Internet, it would be on everyone’s use for short but extensive computations. A potential business application would be to sell the computational resources of the HiPerFPGA over the Internet.
Figure 1 A general block diagram of the HiPerFPGA with its inner structure and network connections
As the HiPerFPGA is a universal hardware platform, practically every conceivable application can or could be implemented on it. Computing models based on parallelism and partial reprogramming of the HiPerFPGA should yield massive speedups when compared to a traditional sequential microprocessor-based implementation. The research project will concentrate on the following application areas to validate the feasibility and relevance of using a massively parallel and distributed reprogrammable platform as an accelerator: digital signal processing (DSP) algorithms, string matching in bioinformatics, statistical problems in computational physics, cryptanalysis, nonlinear optimization methods in evolutionary computation and neural network -based peer-to-peer (P2P) resource discovery algorithm verification. A brief overview of the application areas is presented in the following paragraphs. The challenge is to describe the computation tasks in a hardware description language (VHDL, for example) and to perform partial reprogramming of the HiPerFPGA to maximize its performance.
Three goals
have driven the development of DSP implementations: data parallelism,
application-specific specialization and functional flexibility. A
reprogrammable platform suits all these requirements, as an abundance of
programmable logic facilitates the creation of numerous functional units
directly in hardware. Reprogrammable platforms can be quickly updated based on
application demands without manual upgrades or hardware swaps. Standard DSP
algorithms, such as the Discrete Fourier Transform (DFT) and mathematical
algorithms, such as inverting huge matrices, will be tested on the HiPerFPGA
and the implementation efficiency in terms of speedup and cost will be
evaluated.
An
interesting interdisciplinary application of future electronics lies in the
emerging field of bioinformatics, where the ability to rapidly search large
databases of genetic information is becoming increasingly important. With the
expected completion of the Human Genome Project in 2003, the amount of data to
be searched, as well as the number of searches being performed on this data
will increase dramatically. The basic challenge in this application is string
matching, where two arrays of characters are compared to determine their
similarity. A highly parallel and distributed computing environment consisting
of n discrete nodes can divide the string matching task to n
discrete simultaneous tasks. It is possible, that the HiPerFPGA can achieve
speedups of several orders of magnitude.
Although
analytical methods exist to tackle many-body statistical problems, they are limited
in scope and do not work well when applied to even medium-sized systems. Monte
Carlo tehcniques are typically used in such cases to simulate truncated
versions of these systems. This has traditionally required enormous computing
resources. If the system under simulation (typically a grid or a mesh) is
mapped directly to the HiPerFPGA, its simulation time may decrease drastically.
Random number generators are needed in stochastic simulations, and these can be
implemented in an area-efficient manner on the FPGA Nodes.
Linear cryptanalysis is a technology that takes advantage of
possible input-output correlations of a cipher. Evaluating this relationship
for a sufficient number of plaintext/ciphertext pairs, it is possible to
recover some bits of the key faster than an exhaustive search. The research
project will experiment with and expand on the implementations of linear
cryptanalysis methods used in a hardware-assisted DES (Digital Encryption
Standard) with a goal of minimizing the key retrieval time.
Evolutionary computation mimics the processes of biological evolution with its ideas of natural selection and survival of the fittest to provide effective solutions for optimization problems. Genetic algorithms (GAs) are probably the best-known approach to evolutionary computation. Massively parallel GA implementations may yield speedup factors of several orders of magnitude compared to software-based approaches, as the unusual word lengths, bit-level atomary operations and inherent parallelism of GAs can take full advantage of the underlying reprogrammable platform.
Peer-to-peer (P2P) computing and networking requires mechanisms that are aware of and utilize a global space of computational resources to provide facilities for efficient lookup, wide-area programming, security and fault tolerance. The mathematically oriented research at Jyväskylä University on P2P resource discovery with neural networks can be implemented and verified on the HiPerFPGA, since the speedup factor between hardware and software-based neural network implementations may be several orders of magnitude.
Professor Jorma
Skyttä (Signal Processing Laboratory, Helsinki University of Technology) will manage
the research consortium. Lic. Tech.
Matti Tommiska, whose
doctoral dissertation is expected to take place in summer 2003, will assist
professor Skyttä as a postdoc researcher and project manager.
The following four Ph.D. students will complete their doctoral dissertation under professor Skyttä’s supervision during the three-year research period: Juha Forsten, Esa Korpela, Antti Hämäläinen and Kimmo Järvinen. All researchers have attained solid previous experience on their respective fields of study in their M.Sc. projects.
Professor Jarkko Vuori (Department of Mathematical
Information Technology, University of
Jyväskylä) will supervise Ph.D. student Mikko Vapa, with the goal of
finishing doctoral studies on P2P resource discovery in 2006.
The primary
added value of the research consortium is that the HiPerFPGA platform provides
an excellent testbed for neural networks -based P2P resource discovery
algorithms. The research consortium will co-operate closely with electronic
means (E-mail, WWW and regular meetings.
The project
may also employ M.Sc. students, especially in Master Node control software
development and hardware-related design tasks.
The
HiPerFPGA may also have applications in the Tekes-funded Brocom project
(accelerating radio resource management algorithm simulation) and GO++ project
(linear cryptanalysis of A5/1 encryption).
The preliminary timetable of the research
project is presented below.
2003 |
Construction of the hardware platform begins
with the goal of achieving Ethernet connectivity, applications are studied
and coded in VHDL |
2004 |
First version of the hardware platform is
ready, applications implemented on the platform, first results of speedup
achieved |
2005 |
Second version of the hardware platform with full
Internet connectivity is ready, potentially globally distributed applications
are tested |
2006 |
Additional improvement of the platform, five
Ph.D. thesis are finished. |
The primary
goal of the research project is to attain knowledge of the characteristics and
applicability of reprogrammability in future electronics. This is coupled with
basic research into distributed and parallel methods for accelerating
computation in a cost-effective manner.
The
academic goal is five Ph.D. dissertations (4 at HUT, 1 at JyU) during the
three-year research period. In addition, approximately one M.Sc. thesis per
year will be written.
Scientific
patents will be applied to protect the immaterial rights of the inventor(s), if
the research project produces innovative and patentable results.
The
research project will disseminate its results by publishing articles in
international scientific journals and conferences. The research project will
also continuously maintain its own web server, whose address is http://hiperfpga.hut.fi/.
[1] Logic Element (LE) is the basic area measurement unit in FPGAs. A Logic Element typically consists of a 4-input looup table (LUT) and a presettable register.