Finding Suitable FFT Numbers


Many high quality FFT libraries support fast implementations for small primes. (2,3,5,7). I’ll use CUDA fft here as an example.

When you have say  an array of 18000 samples d_data, you might not want to do this:

nfft = 32768; 
cudafftPlan1d(&m_plan, nfft, CUFFT_C2C, 1); 
cufftExecC2C(m_plan, d_data, d_data, CUFFT_FORWARD);

Especially if you are doing many, many iterations of this.

Instead, you can use a method to find the closest fft number to 18000, that is still made of of small primes (2,3,5,7).

With this findNFFT method, nfft = 18432, which is

2^11 * 3 * 3

In this way, you process less samples, which saves on time. Any gain in speed by using a true radix-2 (32768) number is offset by the number of samples being processed, which is now much less. In one of my projects, this helped to cut the processing time by half.

Here’s the code for the findNFFT method – not the prettiest code but it does the trick.

 * @brief Returns a vector containing the prime factors of n
 * @param [in] The number to find the prime factors for
 * @return
std::vector<int> primeFactors(int n) {
    std::vector<int> vec;

    while (n % 2 == 0) {
        n /= 2;

    for (int i = 3; i <= sqrt(n); i += 2) {
        while (n % i == 0) {
            n /= i;

    if (n > 2)

// std::cout << "Prime factors:" << std::endl;
// for (int j=0; j < vec.size(); j++)
// {
// printf("%d ", vec[j]);
// }
// printf("\n");
    return vec;

 * @brief Used to find the appropriate fft integer for the input n
 * This uses the "formula" (N + D - 1)/D * D
 * Criteria: Output nfft should be a factor of 2,3,5
 * @param [in] Integer to find nfft for
int findNFFT(int n) {
    std::vector<int> ansPrimes;
    std::vector<int> firstPrimes;

    int d = 0;

    do {
        if (n > 2048) d = 512;
        else if (n > 1024) d = 256;
        else if (n > 128) d = 64;
        else if (n > 32) d = 32;
        else if (n > 8) d = 8;
        else d = 2;

        int fn = (n + d - 1) / d * d;
        firstPrimes = primeFactors(fn);

        for (int i = 0; i < firstPrimes.size(); i++) {
            if (firstPrimes[i] == 2 || firstPrimes[i] == 3 || firstPrimes[i] == 5) {
                firstPrimes.erase(firstPrimes.begin() + i);
                i -= 1;

        int newN = 1;
        if (firstPrimes.size() > 0) {
            for (int i = 0; i < firstPrimes.size(); i++)
                newN *= firstPrimes[i];

        n = newN;
        firstPrimes = {};

    } while (n != 1); // if n == 1 means that firstPrimes

    int ans = 1;
    for (int i = 0; i < ansPrimes.size(); i++)
        ans *= ansPrimes[i];

    return ans;

Precision : Of Singles and Doubles


Typically, a single precision digit (aka float32) has a decimal digits precision from 6-9 digits. If you type in:


In Matlab, you’ll find that it gives 1.1920929e-07. This is 2^-23.
Put simply, this is the smallest value required to to effect the next largest single-precision number, when the number is 1.

So for 980e6 for example, the smallest value required to represent the next largest single-precision number will be

980e6 * eps('single') == 1.1682510e+02.

Basically, if you have a single-precision value 980e6, the smallest confirmed granularity is ~116.8!  So when you are doing calculations involving large numbers, it will be good to keep this in mind. When you have something like this:

single( 980e6) + single(k(1))

single(k(1)) needs to be > ~116.8 in order to effect change upon single(980e6).
Basically I’ve said the same thing several different ways.

eps(‘double’) is 2^-52

which is much smaller.

For double-precision, it would be 980e6 * eps(‘double’) == 2.176037128265307e-07. Pretty small. Not something to fret about.

Matlab circshift equivalent in C / C++

Note: The contents have been superceded by a better version at :

For a better circshift using std::rotate check out

matlablogoRecently I needed this function to replicate Matlab’s circshift, which can do it for 2D arrays. Implemented as follows :

template<typename ty>
void circshift(ty *out, const ty *in, int xdim, int ydim, int xshift, int yshift)
 for (int i =0; i < xdim; i++) {
   int ii = (i + xshift) % xdim;
   if (ii<0) ii = xdim + ii;
   for (int j = 0; j < ydim; j++) {
     int jj = (j + yshift) % ydim;
     if (jj<0) jj = ydim + jj;
     out[ii * ydim + jj] = in[i * ydim + j];

Test Code:

int main(int argc, char **argv)
  int* in = (int*)malloc(5*sizeof(int));
  int* out = (int*)malloc(5*sizeof(int));
  for (int i=0; i < 5; i++)
    in[i] = i;
  printArray(in, 5);
  circshift(out, in, 1, 5, 0, -2); //-- +ve == shift right , -ve == shift left
  printArray(out, 5);
  delete in;
  delete out;
  return 0;

Hope this is useful to you!

For implementing ifftshift / fftshift using the above circular shift function, please look at :

How to use std::bind with std::function in C++11


std::bind in C++11 can be thought of as a general-purpose function adaptor. It takes a callable object and generates a new callable that “adapts” the parameter list of the original object.

So generally,

auto newCallable = bind(callable, arg_list)

So delving into an example, suppose we have a function template like this:

typedef std::function<void(void* one, void* two)> thread_pipe_func_t;

and a function which uses this function typedef to pass a function object in.

void *thread_pipe_spawn(void* lala, thread_pipe_func_t func);

and then you have a function that doesn’t quite “fit”, but you would like to be passed into thread_pipe_spawn.

void recorder_thread(void* one, void* two, int someInt) { 
// some stuff

You can do this:

namespace ph = std::placeholders;
thread_pipe_spawn(in, std::bind(recorder_thread, ph::_1, ph::_2, 5))

Jam that recorder_thread in for good!

Bear in mind that the extra variable being bound must be copy-able, as std::bind makes a copy of the bound variable. So if you want to bind a struct. Be sure to bind the pointer to the struct object, not the struct object itself.

ph::_1 and ph::_2 are placeholders. This means that the two variables passed into recorder_thread function, void* one and void* two are translated into the thread_pipe_func_t ‘s definition of void* one and void* two.

This can be used also to bind C++ member functions to functions which expect a C-style functions. (typically callbacks)

MainClass test = new MainClass();
CallbackHolder someclass = new CallbackHolder(); 
someclass.setCallback(std::bind(&MainClass::wanttocallthiscallback, test, ph::_1));

where definition of the class someclass is :

class CallbackHolder {
    // -- blah blah
    typedef std::function<void(int in)> callback_signature_t;
    void set(callback_signature_t cb) { m_callback = cb; }

Hope this is useful to you!

5 Useful Things in C++11

Auto Type Deduction

The auto keyword allows one to use C++11’s in-built type deduction.

std::string something = somethingthatreturnsastring.getString();
auto something = somethingthatreturnsastring.getString();

Auto automatically deduces that the required variable needs to be a string, and will fill in the blanks for you in place of the auto keyword. This is especially useful for iterators.

for(std::vector<T>::iterator it = x.begin(); it != x.end(); i++)

can be now

for(auto it = x.begin(); it != x.end(); i++)

Wow! Cleaner code!

Strongly Typed Enums

This is useful for preventing enum name collisions, a potential source of bugs. In the older C++, one had to specify enums that were unique throughout the program. Previously, if you used None as a enum value, no other enum group could use None. But now you can!

enum class myEnum {None, One, All};
myEnum o = myEnum ::All;
auto p = myEnum::All; // also works fine


Lambdas are basicallly in-place functions. It is especially useful in iterators and for-loops, where it may be that you only need to use this “function” once only in the whole program, so there isn’t any point defining that function. You can’t actually accomplish something you couldn’t before with lambdas, it’s more of a convenience feature influenced by functional programming.  A bare-bone one looks like this:

[]() { }

With all possible lambda operators.

[]() mutable -> T { } 

Where [] is the capture list, () the argument list and {} the function body.

Capture List

The capture list defines what from the outside of the lambda should be available inside the function body and how. It can be either:

  1. a value: [x]
  2. a reference [&x]
  3. any variable currently in scope by reference [&]
  4. same as 3, but by value [=]

You can mix any of the above in a comma separated list [x, &y].

Argument List

The argument list is the same as in any other C++ function.

Function Body

The code that will be executed when the lambda is actually called.

Return Type Deduction

If a lambda has only one return statement, the return type can be omitted and has the implicit type of decltype(return_statement).


If a lambda is marked mutable (e.g. []() mutable { }) it is allowed to mutate the values that have been captured by value.

Here’s an example of it in action.

int main()
   char s[]="Hello World!";
   int Uppercase = 0; //modified by the lambda
   for_each(s, s+sizeof(s), [&Uppercase] (char c) {
    if (isupper(c))
 cout<< Uppercase<<" uppercase letters in: "<< s<<endl;

Unique Pointers

Unique pointers are a class of smart pointers in C++11.
When you define an object using an unique_ptr, the object is destroyed and its memory deallocated when either of the following happens:
– unique_ptr managing the object is destroyed
– unique_ptr managing the object is assigned another pointer via operator= or reset().

For the layman though, it means that when you declare an object using the unique pointer idiom, you do not have to manually delete the object before it goes out of scope.

Previously, one would do this:

YourObject * obj = new YourObject();

and at the end of it all you would have to be sure to


else you would have a memory leak.


 std::unique_ptr<YourObject> obj(new YourObject());

When obj goes out of scope, the memory is free-ed for you.


static_assert is basically asserts that are executed during compile time. For example, you can do the following:

static_assert(sizeof(unsigned int) * CHAR_BIT == 32);

If for some reason the comparison above is not valid for your system, the static_assert would fail.

Another way it can be used is to use it with C++ type traits. For example,

static_assert(std::is_pod<yourstruct>::value, "Not a pod struct!");

POD stands for Plain Old Data – that is, a class (whether defined with the keyword struct or the keyword class) without constructors, destructors and virtual members functions. So basically, if a silly new programmer comes along and decides to add a constructor to a particular struct or class, this static_assert will prevent compilation and single out this error. Useful for maintenance of code!

There are lots more in C++11 that are useful. As I discover more, I’ll post them, maybe in a part 2. Thanks for reading.

Useful ZMQ Message Helper Classes

If you use ZMQ in C/C++, be sure to check this out.

Why use this, u say?The advantage is that msg_single_t/msg_multi_t objects on the stack get cleaned up automatically. It helps to clean up a lot of boilerplate code.


zmq_msg_t msg; 
zmq_msg_init_size(&msg, sizeof(int)); 
int payload = zmq_msg_recv(&msg, zpipe, 0); 

Can now be replaced by

msg_single_t msg(sizeof(int)); 
int payload = msg.recv(zpipe, 0); 

Closing the msg is done for you when msg goes out of scope. This helps to reduce memory leaks / zmq errors due to not closing the message properly. Pretty, pretty code!

C – sleep()ing Correctly …

Recently I needed to do a write to a file every 1/25th of a second. I realised that there is no guarantees on a non-real time system (Debian in this case) that the code is executed exactly once every 0.04 seconds. This is how I did it.

Enter clockGetTime() -> this has precision in the nanoseconds in the time.h library. Be sure to include real time flag -lrt in your project’s linker.

actualSleep is the amount of sleep actually needed when you factor in the amount of time used to execute other statements like the print statement.

In this way, you only sleep() the amount of time you really need to sleep(), and not more. If sleep() were a constant value 1/25 , the amount of delay in between sleeps would be more than expected (by a little, but it adds up). If you do the measurements you would see your results skewing in terms of real time. I noticed this in the first place as my statement was logging GPS UTC time to a file.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <limits.h>
#include <fcntl.h>
#include <stdlib.h>
#include <string.h>
#include <wait.h>
#include <ctype.h>
#include <time.h>
#include <string.h>
#include <sys/statvfs.h>

int getUsecElapsedFromStart(const struct timespec* tstart) {
    struct timespec tnow; 
    clock_gettime(CLOCK_MONOTONIC, &tnow); 
    return (int)((tnow.tv_sec*10.0e9 + tnow.tv_nsec) - (tstart->tv_sec*10.0e9 + tstart->tv_nsec)); 

int getTimeSpecDiff(struct timespec start, struct timespec end) {
    struct timespec temp; 
    if (((end.tv_nsec-start.tv_nsec))<0) {
        temp.tv_sec = end.tv_sec - start.tv_sec - 1;
        temp.tv_nsec = 1000000000 + end.tv_nsec - start.tv_nsec;
    else {
        temp.tv_sec = end.tv_sec - start.tv_sec;
        temp.tv_nsec = end.tv_nsec - start.tv_nsec; 
    // -- convert difference to micro
    return (int)(temp.tv_sec * 10.0e6 + temp.tv_nsec * 10.0e-3);
int main(int argc, char** argv) {

    long  actualSleep = 0;
    long usecToSleep = 0;
    long usecElapsed = 0; 
    struct timespec tstart;
    struct timespec tend;
    clock_gettime(CLOCK_MONOTONIC, &tstart);
    while (1) { 
        // -- start recording to file only when GStreamer process finishes loading 


        usecToSleep = 40000; //(1.0f/(25.0f)) * 1000000;
        clock_gettime(CLOCK_MONOTONIC, &tend);
        usecElapsed = getTimeSpecDiff(tstart,tend);
        //printf("usecElapsed %d\n", usecElapsed); 
        actualSleep = usecToSleep - usecElapsed;
        printf("ActualSleep %d\n", actualSleep);
        if (actualSleep > 0) {
        clock_gettime(CLOCK_MONOTONIC, &tstart);
    return (EXIT_SUCCESS);