Vectorizable Design and Implementation of FFT Based on Fused Multiply-add Architectures
Abstract
This paper proposed a high efficient method by using fused multiply-add instruction to map FFT algorithms based on vector processors. According to the architecture feature of YHFT-Matrix, combing shuffle needs with memory access requests to reduce shuffling pattern, and also the method which utilizes software pipelining to fully exploit instruction-level and data-level parallelism of FFT algorithms. Then the calculating performance is improved. Experimental results show that FFT algorithms achieve high computing performance and speedups. For instance, after adopting FMA instruction optimization, the chip’s computational efficiency of 1024-point double-precision floating-point FFT algorithm is about 10% higher than before.
Keywords
FMA; FFT; vector processor; software pipeline
DOI
10.12783/dtetr/iceta2016/6968
10.12783/dtetr/iceta2016/6968
Refbacks
- There are currently no refbacks.