Iowa Hills Software Digital and Analog Filters

FFT Algorithm
in C and Spectral Analysis Windows
Home

This is the C code for a decimation in time FFT algorithm. It puts DC in bin 0 and scales the output of the forward transform by 1/N. The output is returned in the input array. The code, in plain text, is given here: FFT Algorithm in C

There are many ways to interface to an FFT. Some use one double
array for both the input and output, while others use two complex arrays. Some
algorithms put DC in the center bin while others put DC in bin 0. Some do no
scaling on a forward transform.

Consequently, there is no way to write an FFT that meets everyone's needs, but
it isn't difficult to modify an algorithm to suit your needs.

The **only** difficulty in writing an FFT is
calculating the array indexes, the rest of the code is trivial. We included the
output of some debug code that shows how the indexes change. Compare this output to a
Length 16 FFT Butterfly Chart.

An unfortunate aspect of C compilers is that there is little uniformity in their
complex math libraries. For example, a compiler might access the real part as
X.re, X.real, or X.real(). This makes it difficult to have portable code when
complex numbers are used.

The good thing however is that there are only 2 lines of code (in Transform)
that do complex math. Thus, it should be quite easy modify this code to suit
your compiler.

This code stresses simplicity and clarity, not speed. Once you understand an algorithm
however, it isn't difficult to improve it's execution times.
Here are some obvious things you can do to optimize this code.

- Make use of symmetry when calculating the twiddle factors (more important than it might seem).
- Use a temporary variable in the main transform loop, rather than doing the complex multiplications twice.
- Make use of the fact that Twiddle[0] = 1 and Twiddle[N/4] = +/- j. They get used very often.
- If you do a large number of successive FFTs, stop the repeated memory allocations and Twiddle calculations.
- You will notice on a butterfly chart that the upper and lower halves of the chart can be processed independently until the last section. Since most PC's have at least 2 cores, process the upper and lower halves in separate threads.

Twenty years ago, computer memory was expensive, so FFT algorithms were written to
minimize their memory requirements. They typically did this by placing intermediate
calculation results directly back into the input array and in some cases,
calculating the twiddle factors as needed.

This code uses a temporary buffer array to store intermediate results and
another array for the pre-calculated twiddle factors. This uses more memory
but simplifies the code immensely and makes it very easy to associate a
butterfly chart with the code.

There is also a Discrete Fourier Transform at the bottom of the file. It is
useful for checking the FFT code, should you decide to modify it.

The transform function code is shown here. It is nothing more than two lines of
code for the transform, and indexing. The entire algorithm, which includes
the code for reordering, twiddle factor calculations, memory allocation, and
scaling, is in this plain text file. FFT Algorithm in C

void Transform(ComplexD *Input, ComplexD *Buffer, ComplexD *Twiddle, int N) { int j, k, J, K, I, T; ComplexD *TempPointer; J = N/2; // J increments down to 1 K = 1; // K increments up to N/2 while(J > 0) // This loop runs Log2(N) times. { // Swap pointers, instead doing this: for(j=0; j<N; j++) Input[j] = Buffer[j]; // We start with a swap because of the swap in ReArrangeInput function. TempPointer = Input; Input = Buffer; Buffer = TempPointer; I = 0; for(j=0; j<J; j++) { T = 0; for(k=0; k<K; k++) // This loop runs N/2 times for each J and K { Buffer[I] = Input[I] + Input[K+I] * Twiddle[T]; Buffer[K+I] = Input[I] - Input[K+I] * Twiddle[T]; I++; T += J; } I += K; } // end for(j=0 loop K *= 2; J /= 2; } // end while(J > 0) loop }

Getting the indexes correct is the only difficult part of an FFT algorithm. This print out shows how the indexes for Buffer, Input, and Twiddle change. Compare them to this 16 Pt FFT Butterfly Chart.

N = 16 J = 8 K = 1 Buffer[0] = Input[0] + Input[1] * Twiddle[0] Buffer[1] = Input[0] - Input[1] * Twiddle[0] Buffer[2] = Input[2] + Input[3] * Twiddle[0] Buffer[3] = Input[2] - Input[3] * Twiddle[0] . . . Buffer[14] = Input[14] + Input[15] * Twiddle[0] Buffer[15] = Input[14] - Input[15] * Twiddle[0] J = 4 K = 2 Buffer[0] = Input[0] + Input[2] * Twiddle[0] Buffer[2] = Input[0] - Input[2] * Twiddle[0] Buffer[1] = Input[1] + Input[3] * Twiddle[4] Buffer[3] = Input[1] - Input[3] * Twiddle[4] Buffer[4] = Input[4] + Input[6] * Twiddle[0] . . . Buffer[14] = Input[12] - Input[14] * Twiddle[0] Buffer[13] = Input[13] + Input[15] * Twiddle[4] Buffer[15] = Input[13] - Input[15] * Twiddle[4] See the code file for the complete listing.

Spectral Analysis Windows for the FFT

There are numerous types of spectral analysis windows one can use with an
FFT, as described in this
Wikipedia article.

This The Max Planck Institute for Gravitational Physics (pdf) article does a nice job of explaining the basics of using windows for spectral analysis.

This Windowed FIR Filter C Code has the following windows defined in the lower half of the file. The code in the top half of the file is used to generate the impulse response for an FIR filters. These windows are used for both FIR filter design and spectral analysis.

Hanning | Hamming | Blackman |

Blackman Harris | Blackman Nuttall | Nuttall |

Kaiser | Kaiser Bessel | Trapezoid |

Sinc | Flattop defined | Tukey |

Gauss | Sine |

Do You Need to Use a Window Before the FFT ?

In general, the answer to this question is yes. If you are somewhat new to FFT's
and windows, you will soon learn that while the FFT is the only tool available
for spectral analysis, it is imperfect at best in measuring the spectral content
of an arbitrary signal. The question isn't whether you need to window the data,
but rather, which window is best suited for the task at hand. In other words, do
you need good spectral resolution, or good amplitude accuracy. You can't have
both.

The problem with the FFT is that it assumes that each sine wave component in the signal is present for an integer number of cycles, but this is seldom the case.

To illustrate, we show a signal composed of three sine waves with an integer number of cycles of each plus a DC component. Since this signal is periodic, no window is required and the FFT displays the signal's spectrum with perfection.

Notice however, that in this situation, a window degrades the performance of the FFT. Consider the width of the DC component. These windows have the effect of modulating the signal. The steady state DC component in this signal became a very low frequency component upon application of the window. In fact, all the frequency components are modulated, which accounts for the width of the spectral lines.

It should be clear from the plots below that the Flattop will modulate the signal the most, which causes it to produce the widest spectral lines. This is referred to in the literature as the window's bandwidth, in units of bins. These three windows have bandwidths of: Hanning = 1.5, Gauss = 2.17, Flattop = 3.77 bins.

Next, we add a half cycle to each sine wave. Note how the presence of the third frequency component is completely masked unless a window is used.

It should be clear that each window type has its own distinct properties. There are many types of windows, and massive amounts of literature describing them.

When choosing a window, you must trade off between resolution and amplitude accuracy. The Hanning has the best resolution while the Flattop has the best amplitude accuracy. The Gauss is midway between these two for both accuracy and resolution. It may be worth noting that these were the only windows used in the HP 89410A Vector Signal Analyzer. In other words, unless you have very specific requirements, these three are probably the best to get started with. It is also worth noting that there are many flattop window definitions.

When to Scale the FFT by 1/N ?

If you look at the definition of the Fourier Transform, you will not find a 1/N
scaling factor. Yet most FFT algorithms automatically scale the output of
a forward transform by 1/N, but do not scale the output of the inverse
transform.

This short discussion will try to make it clear when scaling by 1/N is appropriate and why it is typically built into an FFT algorithm on the forward transform.

**Must scale a forward transform by 1/N.
**To start with, it ought to be clear that to analyze a simple sine wave, the
length of the transform should be irrelevant. Suppose
N=1024, Freq=100 and your signal is:

f(n) = cos(Freq * 2*Pi * n/N)

If you take a 1024 point FFT of f(n), you will find that bin[100] = 512. But this isn't a meaningful value until you scale it by 1/N. Then 512/1024 = 1/2 and of course the other 1/2 is in the negative spectrum at bin[924]. If you then double the length of the FFT, the output values also double.

Clearly, the length of the FFT should be irrelevant, but it clearly has an effect. So in this situation we scale the FFT by 1/N

**Must not scale a forward transform.
**Now suppose you have the impulse response of a 32 tap FIR filter and want to
know its frequency response. For convenience, we will assume a low pass filter
with a gain of 1. We know that for this type of filter, the DC component of the FFT must
be 1 (the gain of the filter).

We also know that the FFT's DC component is simply the sum of its input values (i.e. the sum of the FIR coefficients in this example). Clearly, this sum will be the same regardless the length of the FFT, so clearly it must not be scaled by 1/N to give a meaningful result. Thus, for this type of input, the output of the forward transform is not scaled.

**What is the fundamental difference between these two examples?**

The answer is simple. In the first case, we supplied energy for every input
sample. In other words, the sine wave was present for all N samples, so we
needed to scale the FFT's output by 1/N.

The second example was an impulse response of a filter. By definition, we only supplied energy for 1 sample (the impulse at n=0). It took 32 samples for the impulse to work its way through the 32 tap filter, but this delay is irrelevant. Since we supplied energy for 1 sample, we don't scale the FFT's output. If we had supplied energy for two samples, we would scale the output by 1/2.

**Must not scale an inverse transform.
**Now let's consider an inverse FFT. As with the forward FFT, we must consider
the number of samples we are supplying energy to. Of course, we have to be a bit
more careful here because we must fill both the positive and negative frequency
bins appropriately. However, if we place an impulse (i.e. a 1) in two
appropriate bins, then the resulting output of the inverse FFT will be a cosine
wave with an amplitude of 2 no matter how many points we use in the inverse FFT.

Thus, as with the forward FFT, we don't scale the inverse FFT's output if the input is an impulse.

**Must scale an inverse transform.
**Now consider the case where you know the frequency response of a low
pass filter and want to do an inverse FFT to get its impulse response. In this
case, since we are supplying energy at all points, we must scale the FFT's
output by 1/N to get a meaningful answer.

**Another example.
**The four situations just detailed are end point examples where it is
clear how to scale the FFT's output. However, there is a lot of gray area
between the end points. So letâ€™s consider another simple example.

Suppose we have the following signal, with N=1024, Freq=100:

```
f(n) = 6 * cos(1*Freq * 2*Pi * n/N) n = 0 - 127
f(n) = 1 * cos(2*Freq * 2*Pi * n/N) n = 128 - 895
f(n) = 6 * cos(4*Freq * 2*Pi * n/N) n = 896 - 1023
```

Notice the amplitude, frequency, and duty cycle differences for the three
components. It should be clear that while the second component is at 1/36 the
**power** level as the other two, all three frequency components
deliver the same amount of **energy** to the FFT, as shown in the
plot below.
It is important to remember that the FFT's output has units of energy, not
power. (This is easily forgotten because virtually all spectrum analyzer plots are labeled in terms of power, not
energy.)

If we know the duration of the various frequency components, we can then scale the various frequency bins accordingly. In this case, we would scale the output as follows: bin[100] /= 128; bin[200] /= 768; bin[400] /= 128; Then the FFT's output would be more meaningful (in terms of power).

In general however, we have no idea what the duty cycle is for the various frequency components that make up our signal, so we can't possibly do this sort of scaling. In general however, we do supply energy for every sample point, which is why we scale the forward FFT by 1/N when analyzing a signal.

To complicate matters, we would almost certainly apply a window to this signal to improve the FFT's spectral resolution. Since the first and third frequency components are at the beginning and end of the signal, they get attenuated by 27 dB while the center component gets attenuated by only 4 dB (Hanning window).

Clearly, the output of an FFT **can be** a very poor
representation of the input.

The problems are similar when using a spectrum analyzer, analog or FFT. You don't know the actual power of the signal unless you also know its duty cycle. But even then, the windowing, span, sweep rates, filtering, detector type, and other factors, all work to distort the result.

Using an FFT to analyze an arbitrary signal is not fool proof. You must understand the input signal in order to understand the FFT's output. This is also true when using an analog spectrum analyzer.

Copyright 2013 Iowa Hills Software