Blog

 

 

When you think of say, an audio signal or waveform, you think of this, right?

waveform

 Figure 1: Audio waveform in the time domain.

Figure 1 depicts an audio signal in the time domain; essentially a visualisation of the variation of the signal’s amplitude (plotted on the y-axis) over time (plotted on the x-axis). So, by observing the signal in this way, how much information can actually be extracted[1]?

  1. Signal duration.
  2. Root Means Squared (RMS) and peak amplitude.
  3. Zero Crossing Rate (ZCR).
  4. Frequency approximation (using Autocorrelation).

Whilst the above list is in no way exhaustive, it hints towards the fact that the majority of complex analysis must take place within an entirely separate domain: the frequency domain. The frequency domain is, as the name suggests, a representation of a signal (or a snapshot of a signal) in the domain of frequency; thus, as the x-axis denotes time when in the time domain, it denotes frequency in the frequency domain; see figure 2.

Figure 2: 10 Hz sinusoidal wave in the time domain (left) and frequency domain (right).

Figure 2: 10 Hz sinusoidal wave in the time domain (left) and frequency domain (right).

In 1807, Joseph Fourier, the namesake of the widely used Fourier Transform (FT), theorised that complex signals, such as speech, are composed of a superposition of sine waves that vary in frequency, amplitude and relative phase; a Fourier series[2]. By this logic then, an accurate representation of a signal in the time domain can yield an accurate representation of the same signal in the frequency domain. This trail of thought eventually led to the continuous Fourier Transform (below); variations of which are used in a practically infinite list of application domains today.

$$f(\xi) = \int_{-\infty}^{\infty}f(x) e^{-2\pi x\xi} dx$$

With regards to Digital Signal Processing (DSP) applications, the continuous FT simply cannot be used; the integration runs from minus infinity to infinity. Thus, to compute the FT of any signal, regardless of duration, will take an infinite amount of time. Digital signals of course are in no way continuous, they are composed of a number discrete samples taken at regular intervals. 

 

The continuous FT was eventually adapted for the analysis of discrete time signals and termed the Discrete Fourier Transform (DFT), the focus of this article. You may well have heard of the Fast Fourier Transform (FFT) more so than the DFT but do not fear, you’re in the right place! The FFT is merely an algorithm for the efficient computation of the DFT[3].

 

Let’s quickly talk about just how efficient the FFT algorithm is. The computation of the DFT directly is known to require O(N2) operations: N outputs are the sum of N terms, whereas the FFT requires O(N log N). Consider an example where N=4096. The DFT requires N2 multiplication operations and N(N-1) complex additions, resulting in approximately 33 million operations. The radix-2 Cooley-Tukey FFT algorithm (the most commonplace) typically requires approximately 30,000 operations; roughly 1,000 times less[4].

 

Anyway, let’s continue. The equation for the DFT is below:

$$X_{k} = \sum_{n=0}^{N-1} x_{n} . cos(-\frac{2\pi kn}{N}) + jsin(-\frac{2\pi kn}{N})$$

Where x is a finite set of discretely sampled time series data (with xn being the nth sample of x), N is the length of x, and k represents the current point of resolution (commonly referred to as a frequency bin). Assume the 4096 points of resolution as in the discussion regarding efficiency, the above equation will run with k ranging from 0 to 4095. The total number of frequency bins directly correlates with precision. An audio signal typically ranges from approximately 20 Hz – 20 kHz. A DFT of 4096 points will have 4096 separate frequency bins spaced equidistantly between 20 – 20,000. This equates to a frequency resolution of 4.87 Hz. Conversely, A 512-point DFT sacrifices resolution (39.02 Hz per bin) in place of efficiency.

 

So, in essence, the DFT of a signal is calculated by, for each frequency bin, the creation of a complex number (represented by j in the equation) from the sum of the values in the input signal. Lastly, for each frequency bin, we have a complex number as the value. Therefore, we must take the absolute magnitude of the complex value, given by the equation below, as the absolute value of energy within the bin[5]:

$$m=\sqrt{a_{k}^{2}+j_{k}^{2}}$$

Where a is the real part of Xk and j is the imaginary part.

 

The DFT is symmetrical around the sampling frequency of the signal. Using the 10 Hz FT in figure 2 as an example, let’s plot the entire DFT (not accounting for symmetry), see figure 3.

10hz symmetryJ

 Figure 3: DFT of 10 Hz sine wave, not accounting for symmetry-related artefacts.

As you can see, with a sampling frequency, in this case, of 128 Hz, we now have an artefact (an example of aliasing) plotted directly at 118 Hz, or 128-10 Hz. This is accounted for by simply disregarding (or not even calculating) the DFT for any value above sampling frequency / 2 (the Nyquist limit).

 

Reference Implementation

Below is an example, written in Python, of how to programmatically generate a sample sine wave, compute it’s DFT, and visualise the result. This code was used to generate the frequency domain view of the signal in figure 2.

 

import matplotlib.pyplot as plt
import numpy as np


def dft(signal, resolution): """ @brief: Compute the DFT of a given signal at a specified resolution and return an array of magnitudes
@param Array signal: Array of numerical values representing an input signal @param Int resolution: Number of points of resolution
@returns Array magnitudes: An array of DFT magnitudes representing total energy in each DFT bin """
magnitudes = list() length = len(signal)
for fbin in range(resolution):
fbin_list = list()
for sidx, sample in enumerate(signal):
c = complex(sample * np.cos((-(2 * np.pi * fbin * sidx)/length)), sample * np.sin((-(2 * np.pi * fbin * sidx)/length)) ) fbin_list.append(c)
coeff = sum(fbin_list) * 2 magnitudes.append(np.sqrt((coeff.real ** 2) + (coeff.imag ** 2))/length)
return np.array(magnitudes)

def get_sine(sample_frequency, signal_frequency): """ @brief: Calculate sample_frequency values of a sinusoidal wave of frequency signal_frequency
@param Int sample_frequency: Sampling frequency of signal @param Int signal_frequency: Intended frequency of signal
@returns Array: Array of float values representing a 1s duration sine wave """
time_vector = np.arange(0, 1, 1.0 / sample_frequency)
return np.sin(2 * np.pi * signal_frequency * time_vector)

if __name__ == '__main__': sample_frequency = 128 signal_frequency = 10
signal = get_sine(sample_frequency, signal_frequency)
""" Equidistantly spaced values 0-149 """ k = np.arange(sample_frequency)
""" Handle DFT symmetry, only plot sample_freq/2 on X-axis """ X = k[range(sample_frequency/2)]
Y = dft(signal, 128) """ Handle symmetry, only plot up to sample_freq/2 values on Y-axis """ Y = Y[range(sample_frequency/2)]
plt.plot(X, Y) plt.xlabel('Frequency (Hz)') plt.ylabel('Amplitude')
plt.show()

 

References

  1. Cassidy, S. (2002). COMP449: Speech Recognition. [online] Macquarie University. Available at: http://web.science.mq.edu.au/~cassidy/comp449/html/ch05s02.html
  1. Fourier, J. (1807). Mémoire sur la propagation de la chaleur dans les corps solides. Paris: Bernard
  1. Van Loan, C. (1992). Computational frameworks for the fast fourier transform. 1st ed. Philadelphia: Society for Industrial and Applied Mathematics
  1. Frigo, M. and Johnson, S. (2005). The Design and Implementation of FFTW3. Proceedings of the IEEE, 93(2), pp.216-231
  1. Ahlfors, L. (1978). Complex analysis. 3rd ed. New York: McGraw-Hill