AbstractsComputer Science

A hardware Accelerator for the OpenFOAM Sparse Matrix-Vector Product:

by M. Taouil




Institution: Delft University of Technology
Department:
Year: 2009
Keywords: FPGA; Double Precision Floating Point; Sparse Matrix dense Vector Product; OpenFOAM
Record ID: 1249669
Full text PDF: http://resolver.tudelft.nl/uuid:ce583533-45ea-4237-b18d-fe31272ea1ee


Abstract

One of the key kernels in scientific applications is the Sparse Matrix Vector Multiplication (SMVM). Profiling OpenFOAM, a sophisticated scientific Computational Fluid Dynamics tool, proved the SMVM to be its most computational intensive kernel. A traditional way to solve such computationally intensive problems in scientific applications is to employ supercomputing power. This approach, however, provides performance efficiency at a high hardware cost. Another approach for high performance scientific computing is based on reconfigurable hardware. Recently, it is becoming more popular due to the increasing On-Chip memory, bandwidth and abundant reasonable cheaper hardware resources. The SGI Reconfigurable Application Specific Computing (RASC) library combines both approaches as it couples traditional supercomputer nodes with reconfigurable hardware. It supports the execution of computational intensive kernels on Customized Computing Units (CCU) in Field Programmable Gate Arrays (FPGA). This thesis presents the architectural design and implementation of the SMVM product for the OpenFOAM toolbox on an FPGA-enabled supercomputer. The SMVM is targeted to be a Custom Computing Unit (CCU) within the RASC machine. The proposed CCU comprises multiple Processing Elements (PE) for IEEE-754 compliant floating point double precision data. Accurate equations are developed that describe the relation between the number of PEs and the available bandwidth. With two PEs and an input bandwidth of 4.8 GB/s the hardware unit can outperform execution in pure software. Simulations suggest speedups between 2.7 and 7.3 for the SMVM kernel considering four PEs. The performance increase at the kernel level is nearly linear to the number of available PEs. The SMVM kernel has been synthesized and verified for the Virtex-4 LX200 FPGA and a hardware counter is integrated in the design to obtain the accurate performance results per CCU. Although the synthesis tool reports higher frequencies, the design has been routed and executed on the Altix 450 machine at 100 MHz. Based on our experimental results we can safely conclude that the proposed approach, using FPGAs as accelerator, has potential for application speedup for the SMVM kernel against traditional supercomputing approaches.