Faculty Interaction

Ahmed Sameh

Department of Computer Science
Purdue University

Brief Project Description:
Project: Efficient numerical methods for saddle-point problems arising in computational fluid dynamics (CFD).

Traditional linear solvers, such as Krylov subspace methods preconditioned using incomplete factorizations, can be inefficient when applied to indefinite saddle-point problems, that arise in incompressible flow problems. In this project, we develop new algorithms that have the potential to be computationally much more efficient than the best known preconditioning techniques.


A typical sparse matrix arising in saddle-point problems.

Reference:
Parallel System Solvers for the Navier-Stokes Equations. Murat Manguoglu, Ahmed Sameh, Faisal Saied, presented at PMAA , IRISA, Rennes, France, September 7- 9, 2006.

Project: SPIKE - A parallel banded linear system solver.

The idea of SPIKE algorithm was first proposed by Prof. Sameh in 1978. It has now been re-implemented using MPI (Message Passing Interface library) incorporating several new techniques resulting in significant performance enhancements compared to the popular ScaLAPACK package. Currently SPIKE is a family of algorithms for solving large system of linear equations resulting from PDE or non-PDE based physical systems. These systems that arise are normally banded and could be sparse or dense with in the band.

The project involved porting and performance evaluation of the family of SPIKE algorithms on different architectures such as IBM P690, SGI Altix, Linux clusters (Itanium2 and Intel Xeon) and Cray-X1E (a vector machine). The performance of diagonally and non-diagonally dominant dense narrow banded matrices was compared to equivalent routines from the LAPACK and ScaLAPACK packages. We have observed better performance (by several factors) on different platforms.

The above figure shows the performance comparison of SPIKE and ScaLAPACK (total runtime vs. varying bandwidth) on various high-end computing architectures. The size of the system used is N=480,000 with one RHS and using a fixed number of 32 processors.

However, on vector architectures the algorithm needs modification (data structures or the algorithm itself) to exploit the available vectorization for better performance. Applications with long vectors see significant fraction of peak performance on such platforms. We are currently studying the performance of SPIKE on vector machines to understand the fine balance between vectorization and parallelization.

Reference - Draft paper

Performance evaluation of a parallel banded linear system solver SPIKE on high-end computing platforms. M. Sayeed, E. Polizzi, F. Saied, A. Sameh.