Header menu link for other important links
X
Exploring Vectorization and Prefetching Techniques on Scientific Kernels and Inferring the Cache Performance Metrics
Published in IGI Global
2015
Volume: 7
   
Issue: 2
Pages: 18 - 36
Abstract
Performance improvement in modern processor is staggering due to power wall and memory wall problem. In general, the power wall problem is addressed by various vectorization design techniques. The Memory wall problem is diminished through prefetching technique. In this paper vectorization is achieved through Single Instruction Multiple Data (SIMD) registers of the current processor. It provides architecture optimization by reducing the number of instructions in the pipeline and by minimizing the utilization of multi-level memory hierarchy. These registers provide an economical computing platform compared to Graphics Processing Unit (GPU) for compute intensive applications. This paper explores software prefetching via Streaming SIMD extension (SSE) instructions to mitigate the memory wall problem. This work quantifies the effect of vectorization and prefetching in Matrix Vector Multiplication (MVM) kernel with dense and sparse structure. Both Prefetching and Vectorization method reduces the data and instruction cache pressure and thereby improving the cache performance. To show the cache performance improvements in the kernel, the Intel VTune amplifier is used. Finally, experimental results demonstrate a promising performance of matrix kernel by Intel Haswell's processor. However, effective utilization of SIMD registers is a programming challenge to the developers.
About the journal
JournalInternational Journal of Grid and High Performance Computing
PublisherIGI Global
ISSN1938-0259
Open Access0