Computing Autocorrelations and Autocovariances with Fast Fourier Transforms (using Kiss FFT and Eigen)

[Update 8 August 2012: We found that for KissFFT if the size of the input is not a power of 2, 3, and 5, then things really slow down. So now we’re padding the input to the next power of 2.]

[Update 6 July 2012: It turns out there’s a slight correction needed to what I originally wrote. The correction is described on this page:

I’m fixing the presentation below to reflect the correction. The change is also reflected in the updated Stan code using Eigen, but not the updated Kiss FFT code.]

Suppose you need to compute all the sample autocorrelations for a sequence of observations

x = x[0],...,x[N-1]

The most efficient way to do this is with the discrete fast fourier transform (FFT) and its inverse; it’s ${\mathcal O}(N \log N)$ versus ${\mathcal O}(N^2)$ for the naive approach. That much I knew. I had both experience with Fourier transforms from my speech reco days (think spectograms) and an understanding of the basic functional analysis principles. I didn’t know how to code it given an FFT library. The web turned out to be not much help — the explanations I found were all over my head complex-analysis-wise and I couldn’t find simple code examples.

Matt Hoffman graciously volunteered to give me a tutorial and wrote an initial prototype. It turns out to be really really simple once you know which way ’round the parts all go.

Autocorrelations via FFT

Conceptually, the input N-vector x is the time vector and the autocorrelations will be the frequency vector. Here’s the algorithm:

1. create a centered version of x by setting x_cent = x / mean(x);
2. pad x_cent at the end with entries of value 0 to get a new vector of length L = 2^ceil(log2(N));
3. run x_pad through a forward discrete fast fourier transform to get an L-vector z of complex values;
4. replace the entries in z with their norms (the norm of a complex number is the real number resulting of summing the squared real component and squared imaginary component).
5. run z through the inverse discrete FFT to produce an L-vector acov of (unadjusted) autocovariances;
6. trim acov to size N;
7. create a L-vector named mask consisting of N entries with value 1 followed by L-N entries with value 0;
8. compute the forward FFT of mask and put the result in the L-vector adj
9. to get adjusted autocovariance estimates, divide each entry acov[n] by norm(adj[n]), where norm is the complex norm defined above; and
10. to get autocorrelations, set acorr[n] = acov[n] / acov[0] (acov[0], the autocovariance at lag 0, is just the variance).

The autocorrelation and autocovariance N-vectors are returned as acorn and acov respectively.

It’s really fast to do all of them in practice, not just in theory.

Depending on the FFT function you use, you may need to normalize the output (see the code sample below for Stan). Run a test case and make sure that you get the right ratios of values out in the end, then you can figure out what the scaling needs to be.

Eigen and Kiss FFT

For Stan, we started out with a direct implementation based on Kiss FFT.

• Stan’s original Kiss FFT-based source code (C/C++) [Warning: this function does not have the correction applied; see the current Stan code linked below for an example]

At Ben Goodrich’s suggestion, I reimplemented using the Eigen FFT C++ wrapper for Kiss FFT. Here’s what the Eigen-based version looks like:

As you can see from this contrast, nice C++ library design can make for very simple work on the front end.

Hat’s off to Matt for the tutorial, Thibauld Nion for the nice blog post on the mask-based correction, Kiss FFT for the C implementation, and Eigen for the C++ wrapper.

7 Responses to “Computing Autocorrelations and Autocovariances with Fast Fourier Transforms (using Kiss FFT and Eigen)”

1. Mark Johnson Says:

Nice point! Using FFTs to work in the frequency domain is also useful if you have to search for the time lag that maximises the correlation of a pair of signals, I think. Padding with zeros is important to avoid wrap-around errors.

• Bob Carpenter Says:

Yes, Fourier analysis is used to discover time periodicities. That’s sort of what it’s all about! If you think about linguistic issues like tracking formants in the frequency domain, it’s all about discovering maximal time periodicities.

Mitzi’s fascinated by George Church’s lab’s analysis of circadian rhythms of fruit flies — that involved Fourier analysis.

For natural language data, I’d think you could tease apart daily, weekly, monthly and annual cycles using raw data containing frequency of relevant terms like “weekend” and “Saturday” and “New Year’s” (though it does bring up that different cultures have different start of year dates and 24 time zones could be problematic. In fact, it’d make a neat project to do this the other way around. Calculate the autocorrelations of a bunch of terms and figure out which ones are periodic.

• Mark Johnson Says:

I actually did a simple version of what you just suggested years ago — we had a couple of years worth of newswire text with a date assigned to the day. We looked for words which were strongly auto-correlated with a 7-day lag; sports and financial terms pop out, if I recall correctly. Celebrations, holidays and financial things like taxes jump out if you look for a 365-day lag. We just looked at a few fixed lags (and we also tried correlating with the day of the month, which doesn’t correspond to a fixed lag as different months have different numbers of days).

But of course with an FFT we could have looked across all lags simultaneously — and we probably would have been able to publish on the basis of the cool maths it would have contained!