.: Core Research Activities in Reconfigurable Computing Laboratory

 


7. Many-Core Architecture Exploration for Scientific Applications

Students: Gregory Streimer

A common problem posed in computational biology is the comparison of protein sequences with unknown functionality to that of a set of known protein sequences to detect functional similarities. The most sensitive algorithm for searching similarities is the Smith-Waterman algorithm, however it is also the most time consuming. With an ever increasing size in protein and DNA databases, searching them becomes more difficult with exact algorithms such as Smith-Waterman. This research explores reducing the computational time of the algorithm by utilizing multi-core technologies. Currently we are examining the potential of Compute Unified Device Architecture (CUDA) using the Nvidia C870 graphics processing unit (GPU). This card has massively parallel computational capabilities, which makes it ideal for high-performance scientific computing. The C870 has sixteen multi-processors, each containing eight streaming processors which individually operate at 1.35 GHz. Each multi-processor is capable of launching up to 768 threads at once, and each thread can run a Smith-Waterman alignment on a different sequence pair. On the GPU, the threads are organized in blocks within a grid of blocks which are launched from the kernel. This can be seen in Figure 1.

Current limitations on the GPUs memory can constrain how much data can be processed at any given time, so exploiting different memory resources to their fullest potential for Smith-Waterman is also a part of this research. Due to some of the memory restrictions, our implementation utilizes the system's cached constant memory for storage of the substitution matrices, as well as the user's query sequence. This allows for faster access to these highly utilized values through the use of an extremely efficient cost function. A major advantage of our implementation over other GPU implementations is the fact that we do not place restrictions on the length of database sequences, so a database can truly be searched in its entirety. This work will be benchmarked against another known GPU implementation, as well as commonly used serial and multi-threaded CPU implementations. Figure 2 displays our basic program's flow

Publications:

Greg Streimer, Ali Akoglu, "Sequence Alignment with GPU: Performance and Design Challenges", 23rd IEEE International Parallel and Distributed Processing Symposium, May 25-29, 2009, Rome, Italy

This study is posted on CUDA Zone

Source Code fror Smith-Waterman Algorithm for use on an NVIDIA GPU using CUDA (posted 6/16/2009)