Vectorizable Design and Implementation of FFT Based on Fused Multiply-add Architectures

Junyang Zhang; Yang Guo; Xiao Hu

doi:10.12783/dtetr/iceta2016/6968

Vectorizable Design and Implementation of FFT Based on Fused Multiply-add Architectures

Junyang Zhang, Yang Guo, Xiao Hu

Abstract

This paper proposed a high efficient method by using fused multiply-add instruction to map FFT algorithms based on vector processors. According to the architecture feature of YHFT-Matrix, combing shuffle needs with memory access requests to reduce shuffling pattern, and also the method which utilizes software pipelining to fully exploit instruction-level and data-level parallelism of FFT algorithms. Then the calculating performance is improved. Experimental results show that FFT algorithms achieve high computing performance and speedups. For instance, after adopting FMA instruction optimization, the chipâ€™s computational efficiency of 1024-point double-precision floating-point FFT algorithm is about 10% higher than before.

Keywords

FMA; FFT; vector processor; software pipeline

DOI
10.12783/dtetr/iceta2016/6968

Refbacks

There are currently no refbacks.

Username
Password
Remember me

ENGINEERINGand TECHNOLOGY RESEARCH

Vectorizable Design and Implementation of FFT Based on Fused Multiply-add Architectures

Abstract

Keywords

Refbacks

ENGINEERING
and TECHNOLOGY RESEARCH