Valgrind – Suppressing CUDA/ZMQ/IPP/OpenMP Errors

Valgrind is great, but it doesn’t recognize some calls like CUDA’s. This means that Valgrind frequently reports these as leaks, even when they are legitimate calls. Even a simple cudaFree() can cause Valgrind to complain.

You call the suppression file like this:

> valgrind --suppressions=ippcuda.supp ./yourProgram

What is does is to basically ignore these errors that valgrind complains about. The downside is that you may miss some legitimate leaks in CUDA, for example. For that, you can use cuda-memcheck , but that is really, really slow. Boo NVIDIA.

##----------------------------------------------------------------------##
# ZMQ Suppresions

{
<socketcall_sendto>
Memcheck:Param
socketcall.sendto(msg)
fun:send
...
}
{
<socketcall_sendto>
Memcheck:Param
socketcall.send(msg)
fun:send
...
}

##----------------------------------------------------------------------##
# Intel Suppresions

{
 <insert_a_suppression_name_here>
 Memcheck:Cond
 fun:__intel_sse2_strrchr
 fun:_init
 fun:_dl_init
 obj:/usr/lib64/ld-2.17.so
 obj:*
 obj:*
 obj:*
}

{
 <insert_a_suppression_name_here>
 Memcheck:Cond
 fun:__intel_sse2_strrchr
 fun:DynReload
 fun:ippSetCpuFeatures
 fun:_init
 fun:_dl_init
 obj:/usr/lib64/ld-2.17.so
 obj:*
 obj:*
 obj:*
}

{
 <insert_a_suppression_name_here>
 Memcheck:Cond
 fun:__intel_sse2_strrchr
 fun:DynReload
 fun:ippSetCpuFeatures
 fun:main
}


{
 <insert_a_suppression_name_here>
 Memcheck:Cond
 fun:__intel_sse2_strrchr
 fun:_init
 fun:_dl_init
 obj:/usr/lib64/ld-2.17.so
}

{
 <insert_a_suppression_name_here>
 Memcheck:Cond
 fun:__intel_sse2_strrchr
 fun:DynReload
 fun:ippSetCpuFeatures
 fun:_init
 fun:_dl_init
 obj:/usr/lib64/ld-2.17.so
}

{
 <insert_a_suppression_name_here>
 Memcheck:Cond
 fun:__intel_sse2_strrchr
 fun:DynReload
 fun:ippInit
 fun:main
}

##----------------------------------------------------------------------##
# OMP Suppresions

{
 <insert_a_suppression_name_here>
 Memcheck:Leak
 match-leak-kinds: possible
 fun:calloc
 fun:_dl_allocate_tls
 fun:pthread_create@@GLIBC_2.2.5
 obj:/usr/lib64/libgomp.so.1.0.0
 fun:_ZN9AutoFocus8startPGAEP7Ipp32fcfiiPf
 fun:main
}

##----------------------------------------------------------------------##
# CUDA Suppresions

{
 <alloc_libcuda>
 Memcheck:Leak
 match-leak-kinds: reachable,possible
 fun:*alloc
 ...
 obj:*libcuda.so*
 ...
}

{
 <alloc_libcuda>
 Memcheck:Leak
 match-leak-kinds: reachable,possible
 fun:*alloc
 ...
 obj:*libcufft.so*
 ...
}

{
 <alloc_libcudart>
 Memcheck:Leak
 match-leak-kinds: reachable,possible
 fun:*alloc
 ...
 obj:*libcudart.so*
 ...
}

If this doesn’t suit you, you can print your own suppressions using.

./valgrind --gen-suppressions=yes ./yourprogram

Valgrind will then generate suppressions catering to the particular error you have.
Don’t go around suppressing legit valgrind leak detections though!

Here’s my own suppression file which includes the above, hope it helps.
https://onedrive.live.com/redir?resid=692F268A60881F2D!22968&authkey=!ANsb8IMA9e8lkOw&ithint=file%2csupp

Valgrind – Dealing with IPP / AVX Related False Positives

Update: Valgrind 3.11 doesn’t show the AVX errors mentioned below. So if you have the option, upgrading it to 3.11 is probably the better option. 

Debugging Intel IPP-enabled C/C++ programs with Valgrind, you may run into the following issues.

Process terminating with default action of signal 4 (SIGILL)

Illegal opcode at address 0xEBC9CD4

at 0xEBC9CD4 : own_ipps_sAtan2_E9LAynn (in /opt/intel/compilers_and_libraries_2016.0.109/linux/ipp/lib/intel64_lin/libippvme9.so.9.0

The program terminates because of this apparently “illegal opcode” that valgrind doesn’t recognize.

This is ok. It’s just that Valgrind doesn’t recognize certain AVX opcodes.
If you want Valgrind to proceed anyway, do this:

From the IPP Manual, you can find this:

http://www.hpc.ut.ee/dokumendid/ips_xe_2015/composerxe/Documentation/en_US/ipp/ipp_manual/GUID-C730D3B1-6232-45AF-A757-DF52850388CD.htm

32-bit code:

#define PX_FM ( ippCPUID_MMX | ippCPUID_SSE )
#define W7_FM ( PX_FM | ippCPUID_SSE2 )
#define V8_FM ( W7_FM | ippCPUID_SSE3 | ippCPUID_SSSE3 )
#define S8_FM ( V8_FM | ippCPUID_MOVBE )
#define P8_FM ( V8_FM | ippCPUID_SSE41 | ippCPUID_SSE42 | ippCPUID_AES | ippCPUID_CLMUL | ippCPUID_SHA )
#define G9_FM ( P8_FM | ippCPUID_AVX | ippAVX_ENABLEDBYOS | ippCPUID_RDRRAND | ippCPUID_F16C )
#define H9_FM ( G9_FM | ippCPUID_AVX2 | ippCPUID_MOVBE | ippCPUID_ADCOX | ippCPUID_RDSEED | ippCPUID_PREFETCHW )
64-bit code:

#define PX_FM ( ippCPUID_MMX | ippCPUID_SSE | ippCPUID_SSE2 )
#define M7_FM ( PX_FM | ippCPUID_SSE3 )
#define N8_FM ( S8_FM )
#define U8_FM ( V8_FM )
#define Y8_FM ( P8_FM )
#define E9_FM ( G9_FM )
#define L9_FM ( H9_FM )

Copy and paste these on the top of your code. Just until P8_FM for this case will do. So you can actually “use” P8_FM, which essential means that ipp will use the SSE type instructions and will avoid the AVX types.

#define PX_FM ( ippCPUID_MMX | ippCPUID_SSE )
#define W7_FM ( PX_FM | ippCPUID_SSE2 )
#define V8_FM ( W7_FM | ippCPUID_SSE3 | ippCPUID_SSSE3 )
#define S8_FM ( V8_FM | ippCPUID_MOVBE )
#define P8_FM ( V8_FM | ippCPUID_SSE41 | ippCPUID_SSE42 | ippCPUID_AES | ippCPUID_CLMUL | ippCPUID_SHA )

// then in your main()
ippInit();
 
ippSetCpuFeatures(P8_FM) // -- purely to deal with valgrind false positives. Comment out if you want maximum performance using AVX.

I’m using the Valgrind that comes with CentOS 7, 3.10.0. Do let me know if this has been fixed in 3.11. 🙂