Compiling Static vs Dynamic Libraries on CMake


Why compile statically?

This allows for ease of deployment, at the expense of a larger binary executable.
You don’t have to copy the libraries that you use manually to the target system

CMake link_libraries() Magic

I use CLion, which (currently) enforces the use of CMake in compiling C/C++ projects. In your CMakeLists.txt file, first make sure you link the directory to find your files:


CMake has a magic link_libraries() function which takes in the library specified and determines how you want it to be compiled (statically or dynamically linked).

If you type


It is interpreted as a dynamic linked library.


Tells CMake to look for this static library file in the linked directories, and build it statically into your binary.

Order of Static Linking Matters



I met this error.

/usr/local/lib/libPocoXML.a(XMLWriter.o): In function `Poco::XML::XMLWriter::XMLWriter(std::ostream&, int)':XMLWriter.cpp:(.text+0x28b3): undefined reference to `Poco::UTF8Encoding::UTF8Encoding()'XMLWriter.cpp:(.text+0x28cc): undefined reference to `Poco::UTF8Encoding::UTF8Encoding()'

It seems that the libPocoXML.a static library is trying to call functions in libPocoFoundation.a but can’t find them.
Reversing the order of linking the libraries helps.


This is because when CMake links the libPocoXML.a library, it makes a note of the external functions that are called and looks for them to be linked in the subsequent libraries. One example here is the Poco::UTF8Encoding::UTF8Encoding() function.

What is happening here is that CMake links the libPocoXML.a, looks for the function in subsequent libraries that are linked and finds nothing. Reversing the order allows libPocoXML.a to find the desired function later on in libPocoFoundation.a.

This only happens for static library compilation due to how CMake interprets it.
Check your binary using ldd :

ldd DSPBox =>  (0x00007fff2eb46000) => /opt/intel/compilers_and_libraries_2016.0.109/linux/ipp/lib/intel64/ (0x00007f4a209cd000) => /opt/intel/compilers_and_libraries_2016.0.109/linux/ipp/lib/intel64/ (0x00007f4a2078c000) => /opt/intel/compilers_and_libraries_2016.0.109/linux/ipp/lib/intel64/ (0x00007f4a20580000) => /opt/intel/compilers_and_libraries_2016.0.109/linux/ipp/lib/intel64/ (0x00007f4a2036a000) => /usr/local/cuda/lib64/ (0x00007f4a1972f000) => /usr/local/lib/ (0x00007f4a194d3000) => /lib64/ (0x00007f4a19299000) => /lib64/ (0x00007f4a19094000) => /lib64/ (0x00007f4a18e8c000) => /lib64/ (0x00007f4a18b84000) => /lib64/ (0x00007f4a18881000) => /lib64/ (0x00007f4a1866a000) => /lib64/ (0x00007f4a18454000) => /lib64/ (0x00007f4a18091000) => /lib64/ (0x00007f4a17e3c000) => /lib64/ (0x00007f4a17c25000) /lib64/ (0x00007f4a20c53000)


Statically linking a file which has dynamic file dependencies.

This worked


but not the static version.


/usr/local/lib/libtiff.a(tif_jpeg.o): In function `TIFFjpeg_destroy’:/home/pier/Software/Development/tiff-3.8.2/libtiff/tif_jpeg.c:377: undefined reference to `jpeg_destroy’/usr/local/lib/libtiff.a(tif_jpeg.o): In function `TIFFjpeg_write_raw_data’:/home/pier/Software/Development/tiff-3.8.2/libtiff/tif_jpeg.c:320: undefined reference to `jpeg_write_raw_data’/usr/local/lib/libtiff.a(tif_jpeg.o): In function `TIFFjpeg_finish_compress’:…..
If you dig in deeper, you are able to find the dependancies using readelf

 cd /usr/local/
libreadelf -d | grep 'NEEDED'

0x0000000000000001 (NEEDED)             Shared library:
[] 0x0000000000000001 (NEEDED)             Shared library:
[] 0x0000000000000001 (NEEDED)             Shared library:
[] 0x0000000000000001 (NEEDED)             Shared library: []

So apparently we still need the above dynamic libraries even for libtiff.a. Fortunately these files come preinstalled in CentOS and most distributions.
So all you need to do now is :

link_libraries(libtiff.a jpeg z)

libc and libm are linked by default by gcc. CMake interprets this as : compile libtiff.a statically into the binary, but its dependancies libjpeg and libz are still dynamically linked. Get them dynamically from the linked system folders.
To be sure, check using ldd again:

ldd DSPBox =>  (0x00007ffdfc5ee000) => /opt/intel/compilers_and_libraries_2016.0.109/linux/ipp/lib/intel64/ (0x00007fa41cf23000) => /opt/intel/compilers_and_libraries_2016.0.109/linux/ipp/lib/intel64/ (0x00007fa41cce2000) => /opt/intel/compilers_and_libraries_2016.0.109/linux/ipp/lib/intel64/ (0x00007fa41cad6000) => /opt/intel/compilers_and_libraries_2016.0.109/linux/ipp/lib/intel64/ (0x00007fa41c8c0000) => /usr/local/cuda/lib64/ (0x00007fa415c85000) => /lib64/ (0x00007fa415a12000) => /lib64/ (0x00007fa4157fc000) => /lib64/ (0x00007fa4155df000) => /lib64/ (0x00007fa4153db000) => /lib64/ (0x00007fa4151d3000) => /lib64/ (0x00007fa414eca000) => /lib64/ (0x00007fa414bc8000) => /lib64/ (0x00007fa414806000) => /lib64/ (0x00007fa4145ee000) => /lib64/ (0x00007fa4143d8000) /lib64/ (0x00007fa41d1a9000)

No more dynamic library requirement for libtiff!


[ERRFMT] using Poco Logger

When using the poco logger to log values like this, many at times the logged value is displayed as [ERRFMT]. The poco logger is saying that it does not recognize the input parameter as the one being specified by the identifier. Poco is extremely finicky regarding this!

For example,

float stddev = 3.53525f
 m_log.debug("Std dev = %f", stddev);


Std dev = [ERRFMT]

The correct formatting to be used for Poco :

bool %b
char %c
int %d, %i
unsigned %u
short %hd
unsigned short %hu
long %ld
unsigned long %lu
long long %Ld
unsigned long long %Lu
unsigned (octal) %o
unsigned (hex) %x, %X
double %f, %e, %E
float %hf, %he, %hE
std::string %s
std::size_t %z

Hence the example above should be :

 m_log.debug("Std dev = %hf", stddev);

Printf in C automatically promotes variables to the appropriate format for print out, which is why you don’t have this hassle when using printf.

Valgrind – Dealing with IPP / AVX Related False Positives

Update: Valgrind 3.11 doesn’t show the AVX errors mentioned below. So if you have the option, upgrading it to 3.11 is probably the better option. 

Debugging Intel IPP-enabled C/C++ programs with Valgrind, you may run into the following issues.

Process terminating with default action of signal 4 (SIGILL)

Illegal opcode at address 0xEBC9CD4

at 0xEBC9CD4 : own_ipps_sAtan2_E9LAynn (in /opt/intel/compilers_and_libraries_2016.0.109/linux/ipp/lib/intel64_lin/

The program terminates because of this apparently “illegal opcode” that valgrind doesn’t recognize.

This is ok. It’s just that Valgrind doesn’t recognize certain AVX opcodes.
If you want Valgrind to proceed anyway, do this:

From the IPP Manual, you can find this:

32-bit code:

#define PX_FM ( ippCPUID_MMX | ippCPUID_SSE )
#define W7_FM ( PX_FM | ippCPUID_SSE2 )
#define V8_FM ( W7_FM | ippCPUID_SSE3 | ippCPUID_SSSE3 )
#define S8_FM ( V8_FM | ippCPUID_MOVBE )
#define P8_FM ( V8_FM | ippCPUID_SSE41 | ippCPUID_SSE42 | ippCPUID_AES | ippCPUID_CLMUL | ippCPUID_SHA )
#define G9_FM ( P8_FM | ippCPUID_AVX | ippAVX_ENABLEDBYOS | ippCPUID_RDRRAND | ippCPUID_F16C )
64-bit code:

#define PX_FM ( ippCPUID_MMX | ippCPUID_SSE | ippCPUID_SSE2 )
#define M7_FM ( PX_FM | ippCPUID_SSE3 )
#define N8_FM ( S8_FM )
#define U8_FM ( V8_FM )
#define Y8_FM ( P8_FM )
#define E9_FM ( G9_FM )
#define L9_FM ( H9_FM )

Copy and paste these on the top of your code. Just until P8_FM for this case will do. So you can actually “use” P8_FM, which essential means that ipp will use the SSE type instructions and will avoid the AVX types.

#define PX_FM ( ippCPUID_MMX | ippCPUID_SSE )
#define W7_FM ( PX_FM | ippCPUID_SSE2 )
#define V8_FM ( W7_FM | ippCPUID_SSE3 | ippCPUID_SSSE3 )
#define S8_FM ( V8_FM | ippCPUID_MOVBE )
#define P8_FM ( V8_FM | ippCPUID_SSE41 | ippCPUID_SSE42 | ippCPUID_AES | ippCPUID_CLMUL | ippCPUID_SHA )

// then in your main()
ippSetCpuFeatures(P8_FM) // -- purely to deal with valgrind false positives. Comment out if you want maximum performance using AVX.

I’m using the Valgrind that comes with CentOS 7, 3.10.0. Do let me know if this has been fixed in 3.11. 🙂

More Efficient ifftshift / fftshift in C++


Previously I touched upon ifftshift and fftshift in this post. . I’ve come to realize that there are better ways to implement fftshift and iftshift using simple memory swapping. Note that you have to pre-allocate the appropriate amount of memory for T*out before calling this function.

//-- Does 1D fftshift 
template<typename T>
inline void fftshift1D(T *in, T *out, int ydim)
 int pivot = (ydim % 2 == 0) ? (ydim / 2) : ((ydim - 1) / 2);
 int rightHalf = ydim-pivot;
 int leftHalf = pivot;
 memcpy(out, in+(pivot), sizeof(T)*rightHalf);
 memcpy(out+rightHalf, in, sizeof(T)*leftHalf);

//-- Does 1D ifftshift
//-- Note: T* out must already by memory allocated!!
template<typename T>
inline void ifftshift1D(T *in, T *out, int ydim)
 int pivot = (ydim % 2 == 0) ? (ydim / 2) : ((ydim + 1) / 2);

 int rightHalf = ydim-pivot;
 int leftHalf = pivot;
 memcpy(out, in+(pivot), sizeof(T)*rightHalf);
 memcpy(out+rightHalf, in, sizeof(T)*leftHalf);


Hope this helps! It is good as  a Matlab C++ equivalent.

If you want to do 2D you will have to fftshift first, transpose the matrix, then fftshift it again. Then transpose back to the original matrix form.

Also, a better circshift would be to use C++’s in-built std::rotate

template<typename T>
inline void circshift1D_IP(T *in, int ydim, int yshift)
 if (yshift == 0)

 if (yshift > 0) // shift right
 std::rotate(&in[0], &in[ydim - yshift - 1], &in[ydim - 1]);
 else if (yshift < 0) // shift left
 yshift = abs(yshift);
 std::rotate(&in[0], &in[yshift], &in[ydim - 1]);


Precision : Of Singles and Doubles


Typically, a single precision digit (aka float32) has a decimal digits precision from 6-9 digits. If you type in:


In Matlab, you’ll find that it gives 1.1920929e-07. This is 2^-23.
Put simply, this is the smallest value required to to effect the next largest single-precision number, when the number is 1.

So for 980e6 for example, the smallest value required to represent the next largest single-precision number will be

980e6 * eps('single') == 1.1682510e+02.

Basically, if you have a single-precision value 980e6, the smallest confirmed granularity is ~116.8!  So when you are doing calculations involving large numbers, it will be good to keep this in mind. When you have something like this:

single( 980e6) + single(k(1))

single(k(1)) needs to be > ~116.8 in order to effect change upon single(980e6).
Basically I’ve said the same thing several different ways.

eps(‘double’) is 2^-52

which is much smaller.

For double-precision, it would be 980e6 * eps(‘double’) == 2.176037128265307e-07. Pretty small. Not something to fret about.

Matlab circshift equivalent in C / C++

Note: The contents have been superceded by a better version at :

For a better circshift using std::rotate check out

matlablogoRecently I needed this function to replicate Matlab’s circshift, which can do it for 2D arrays. Implemented as follows :

template<typename ty>
void circshift(ty *out, const ty *in, int xdim, int ydim, int xshift, int yshift)
 for (int i =0; i < xdim; i++) {
   int ii = (i + xshift) % xdim;
   if (ii<0) ii = xdim + ii;
   for (int j = 0; j < ydim; j++) {
     int jj = (j + yshift) % ydim;
     if (jj<0) jj = ydim + jj;
     out[ii * ydim + jj] = in[i * ydim + j];

Test Code:

int main(int argc, char **argv)
  int* in = (int*)malloc(5*sizeof(int));
  int* out = (int*)malloc(5*sizeof(int));
  for (int i=0; i < 5; i++)
    in[i] = i;
  printArray(in, 5);
  circshift(out, in, 1, 5, 0, -2); //-- +ve == shift right , -ve == shift left
  printArray(out, 5);
  delete in;
  delete out;
  return 0;

Hope this is useful to you!

For implementing ifftshift / fftshift using the above circular shift function, please look at :