The FFT is an efficient algorithm for computing the DFT. It drastically reduces the cost of implementing the DFT on digital computing systems. Nevertheless, the FFT is still computationally intensive, and continued technological advances of computers demand larger and faster implementations of this algorithm. Past attempts at producing high-performance, and small FFT implementations, have focused on custom hardware (ASICs and FPGAs). Ultimately, the most efficient have been single-chipped, streaming I/O, pipelined FFT architectures. These architectures increase computational concurrency through the use of hardware pipelining. Streaming I/O, pipelined FFT architectures are capable of accepting a single data sample every clock cycle. In principle, the maximum clock frequency of such a circuit is limited only by its critical delay path. The delay of the critical path may be decreased by the addition of pipeline registers. Nevertheless this solution gives diminishing returns. Thus, the streaming I/O, pipelined FFT is ultimately limited in the maximum performance it can provide. Attempts have been made to map the Parallel FFT algorithm to custom hardware. Yet, the Parallel FFT was formulated and optimized to execute on a machine with multiple, identical, processing elements. When executed on such a machine, the FFT requires a large expense on communications. Therefore, a direct mapping of the Parallel FFT to custom hardware results in a circuit with complex control and global data movement. This thesis proposes the Hybrid Architecture Parallel FFT (HAPFFT) as an alternative. The HAPFFT is an improved formulation for building Parallel FFT custom hardware modules. It provides improved performance, efficient resource utilization, and reduced design time. The HAPFFT is modular in nature. It includes a custom front-end parallel processing unit which produces intermediate results. The intermediate results are sent to multiple, independent FFT modules. These independent modules form the back-end of the HAPFFT, and are generic, meaning that any prexisting FFT architecture may be used. With P back-end modules a speedup of P will be achieved, in comparison to an FFT module composed solely of a single module. Furthermore, the HAPFFT defines the front-end processing unit as a function of P. It hides the high communication costs typically seen in Parallel FFTs. Reductions in control complexity, memory demands, and logical resources, are achieved. An extraordinary result of the HAPFFT formulation is a sublinear area-time growth. This phenomenon is often also called superlinear speedup. Sublinear area-time growth and superlinear speedup are equivalent terms. This thesis will subsequently use the term superlinear speedup to refer to the HAPFFT's outstanding speedup behavior. A further benefit resulting from the HAPFFT formulation is reduced design time. Because the HAPFFT defines only the front-end module, and because the back-end parallel modules may be composed of any preexisting FFT modules, total design time for a HAPFFT is greatly reduced



College and Department

Ira A. Fulton College of Engineering and Technology; Electrical and Computer Engineering



Date Submitted


Document Type





fft, parallel fft, fast fourier transform, asic, fpga, field programmable gate arrays, architecture