Finding Suitable FFT Numbers

sfft5

Many high quality FFT libraries support fast implementations for small primes. (2,3,5,7). I’ll use CUDA fft here as an example.

When you have say  an array of 18000 samples d_data, you might not want to do this:

nfft = 32768; 
cudafftPlan1d(&m_plan, nfft, CUFFT_C2C, 1); 
cufftExecC2C(m_plan, d_data, d_data, CUFFT_FORWARD);

Especially if you are doing many, many iterations of this.

Instead, you can use a method to find the closest fft number to 18000, that is still made of of small primes (2,3,5,7).

With this findNFFT method, nfft = 18432, which is

2^11 * 3 * 3

In this way, you process less samples, which saves on time. Any gain in speed by using a true radix-2 (32768) number is offset by the number of samples being processed, which is now much less. In one of my projects, this helped to cut the processing time by half.

Here’s the code for the findNFFT method – not the prettiest code but it does the trick.

/**
 * @brief Returns a vector containing the prime factors of n
 *
 * @param [in] The number to find the prime factors for
 * @return
 */
std::vector<int> primeFactors(int n) {
    std::vector<int> vec;

    while (n % 2 == 0) {
        vec.push_back(2);
        n /= 2;
    }

    for (int i = 3; i <= sqrt(n); i += 2) {
        while (n % i == 0) {
            vec.push_back(i);
            n /= i;
        }
    }

    if (n > 2)
        vec.push_back(n);

// std::cout << "Prime factors:" << std::endl;
// for (int j=0; j < vec.size(); j++)
// {
// printf("%d ", vec[j]);
// }
// printf("\n");
    return vec;
}

/**
 * @brief Used to find the appropriate fft integer for the input n
 * This uses the "formula" (N + D - 1)/D * D
 * Criteria: Output nfft should be a factor of 2,3,5
 *
 * @param [in] Integer to find nfft for
 */
int findNFFT(int n) {
    std::vector<int> ansPrimes;
    std::vector<int> firstPrimes;

    int d = 0;

    do {
        if (n > 2048) d = 512;
        else if (n > 1024) d = 256;
        else if (n > 128) d = 64;
        else if (n > 32) d = 32;
        else if (n > 8) d = 8;
        else d = 2;

        int fn = (n + d - 1) / d * d;
        firstPrimes = primeFactors(fn);

        for (int i = 0; i < firstPrimes.size(); i++) {
            if (firstPrimes[i] == 2 || firstPrimes[i] == 3 || firstPrimes[i] == 5) {
                ansPrimes.push_back(firstPrimes[i]);
                firstPrimes.erase(firstPrimes.begin() + i);
                i -= 1;
            }
        }

        int newN = 1;
        if (firstPrimes.size() > 0) {
            for (int i = 0; i < firstPrimes.size(); i++)
                newN *= firstPrimes[i];
        }

        n = newN;
        firstPrimes = {};

    } while (n != 1); // if n == 1 means that firstPrimes

    int ans = 1;
    for (int i = 0; i < ansPrimes.size(); i++)
        ans *= ansPrimes[i];

    return ans;
}

Valgrind – Suppressing CUDA/ZMQ/IPP/OpenMP Errors

Valgrind is great, but it doesn’t recognize some calls like CUDA’s. This means that Valgrind frequently reports these as leaks, even when they are legitimate calls. Even a simple cudaFree() can cause Valgrind to complain.

You call the suppression file like this:

> valgrind --suppressions=ippcuda.supp ./yourProgram

What is does is to basically ignore these errors that valgrind complains about. The downside is that you may miss some legitimate leaks in CUDA, for example. For that, you can use cuda-memcheck , but that is really, really slow. Boo NVIDIA.

##----------------------------------------------------------------------##
# ZMQ Suppresions

{
<socketcall_sendto>
Memcheck:Param
socketcall.sendto(msg)
fun:send
...
}
{
<socketcall_sendto>
Memcheck:Param
socketcall.send(msg)
fun:send
...
}

##----------------------------------------------------------------------##
# Intel Suppresions

{
 <insert_a_suppression_name_here>
 Memcheck:Cond
 fun:__intel_sse2_strrchr
 fun:_init
 fun:_dl_init
 obj:/usr/lib64/ld-2.17.so
 obj:*
 obj:*
 obj:*
}

{
 <insert_a_suppression_name_here>
 Memcheck:Cond
 fun:__intel_sse2_strrchr
 fun:DynReload
 fun:ippSetCpuFeatures
 fun:_init
 fun:_dl_init
 obj:/usr/lib64/ld-2.17.so
 obj:*
 obj:*
 obj:*
}

{
 <insert_a_suppression_name_here>
 Memcheck:Cond
 fun:__intel_sse2_strrchr
 fun:DynReload
 fun:ippSetCpuFeatures
 fun:main
}


{
 <insert_a_suppression_name_here>
 Memcheck:Cond
 fun:__intel_sse2_strrchr
 fun:_init
 fun:_dl_init
 obj:/usr/lib64/ld-2.17.so
}

{
 <insert_a_suppression_name_here>
 Memcheck:Cond
 fun:__intel_sse2_strrchr
 fun:DynReload
 fun:ippSetCpuFeatures
 fun:_init
 fun:_dl_init
 obj:/usr/lib64/ld-2.17.so
}

{
 <insert_a_suppression_name_here>
 Memcheck:Cond
 fun:__intel_sse2_strrchr
 fun:DynReload
 fun:ippInit
 fun:main
}

##----------------------------------------------------------------------##
# OMP Suppresions

{
 <insert_a_suppression_name_here>
 Memcheck:Leak
 match-leak-kinds: possible
 fun:calloc
 fun:_dl_allocate_tls
 fun:pthread_create@@GLIBC_2.2.5
 obj:/usr/lib64/libgomp.so.1.0.0
 fun:_ZN9AutoFocus8startPGAEP7Ipp32fcfiiPf
 fun:main
}

##----------------------------------------------------------------------##
# CUDA Suppresions

{
 <alloc_libcuda>
 Memcheck:Leak
 match-leak-kinds: reachable,possible
 fun:*alloc
 ...
 obj:*libcuda.so*
 ...
}

{
 <alloc_libcuda>
 Memcheck:Leak
 match-leak-kinds: reachable,possible
 fun:*alloc
 ...
 obj:*libcufft.so*
 ...
}

{
 <alloc_libcudart>
 Memcheck:Leak
 match-leak-kinds: reachable,possible
 fun:*alloc
 ...
 obj:*libcudart.so*
 ...
}

If this doesn’t suit you, you can print your own suppressions using.

./valgrind --gen-suppressions=yes ./yourprogram

Valgrind will then generate suppressions catering to the particular error you have.
Don’t go around suppressing legit valgrind leak detections though!

Here’s my own suppression file which includes the above, hope it helps.
https://onedrive.live.com/redir?resid=692F268A60881F2D!22968&authkey=!ANsb8IMA9e8lkOw&ithint=file%2csupp

Resolution Stuck at 640 x 480 after NVIDIA Driver Installation

For some reason, after I installed the new NVIDIA drivers on CentOS 7, it stopped recognizing my monitor. Driver broken? A slight hack was required.

In /etc/X11/xorg.conf, look for these two lines :

HorizSync 28.0 - 33.0
 VertRefresh 43.0 - 72.0

and replace them with

HorizSync 30.0 - 83.0
 VertRefresh 56.0 - 75.0

Only by doing this, I was able to change the screen resolution in NVIDIA X Server Settings. For the exact values to put for HorizSync and VertRefresh, consult your monitor’s manual.

PyCUDA Windows Installation (Offline)

PyCUDA
PyCUDA is a Python extension for CUDA which is useful for prototyping GPU solutions with Python. Here are some things it is being used for. http://wiki.tiker.net/PyCuda/ShowCase It isn’t officially supported by NVIDIA though.
Because my environment is offline, many of the things I mention here may be extraneous.

First, install CUDA Toolkit for windows – I assume that you have done so or you probably wouldn’t be here.

There is the official installation page for Windows here : http://wiki.tiker.net/PyCuda/Installation/Windows , which you really should follow if you have internet access on your installation computer. If you follow that and you still have trouble, come back here to see if the rest of this helps.

You need Microsoft Visual Studio 2010 for this installation. Or at least, the correct cl.exe compiler and any other dependencies it may need. I have not tested with later versions. If you have a 64-bit system and Python installation, make sure that you are using the 64-bit compiler and not the 32-bit one. The 64-bit cl.exe is located here :

c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\amd64\

You can run cl.exe to check whether you are using the 32 or 64 bit version.

Offline Installation

In the case that you are as unfortunate as me to have to install this stuff offline, here are the various files that I used to install pycuda. https://onedrive.live.com/redir?resid=692F268A60881F2D!15820&authkey=!AJpd-O9yUi_O_1w&ithint=file%2czip

In the case you need any other additional wheel files, download them here: http://www.lfd.uci.edu/~gohlke/pythonlibs/

Do note that you should download the correct wheel file for your python distribution.

For example,

boost_python-1.59-cp27-none-win_amd64.whl

cp27 means Python 2.7. For my installation I downloaded those with cp34 as I had a Python 3.4 installation.

You can use pip, located under ‘scripts’ in the Python directory to install whl (known as wheel) files.

pip install pathandfilenamewhatever.whl

Boost Libraries Installation

After the various installations, I was not able to find the boost thread library when I tried to compile PyCUDA. Hence, I had to compile the boost libraries manually.

Refer to section 5.1 here http://www.boost.org/doc/libs/1_55_0/more/getting_started/windows.html

for the easiest method to compile the boost libraries.

After the boost libraries were compiled, I went to the ‘boost/lib’ directory and manually copied out the files I wanted to the python installation\libs directory :

  • libboost-thread-vc100-mt-1_59
  • libboost_thread-vc100-mt-gd-1_59

PyCUDA Compilation

Now, we can start compiling PyCUDA.

Unzip the PyCUDA source files, and at the command prompt inside the PyCUDA directory, type this:
configure

This will create a siteconf.py at your directory which houses the information required by PyCUDA for installation.

Here’s how my siteconf.py looks like:

BOOST_INC_DIR = ['']
BOOST_LIB_DIR = ['']
BOOST_COMPILER = 'msvc'
USE_SHIPPED_BOOST = True
BOOST_PYTHON_LIBNAME = ['libboost_python3-vc100-mt-1_59']
BOOST_THREAD_LIBNAME = ['libboost_thread-vc100-mt-1_59']
CUDA_TRACE = False
CUDA_ROOT = 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v7.5'
CUDA_ENABLE_GL = False
CUDA_ENABLE_CURAND = True
CUDADRV_LIB_DIR = ['${CUDA_ROOT}/lib', '${CUDA_ROOT}/lib/x64']
CUDADRV_LIBNAME = ['cuda']
CUDART_LIB_DIR = ['${CUDA_ROOT}/lib', '${CUDA_ROOT}/lib/x64']
CUDART_LIBNAME = ['cudart']
CURAND_LIB_DIR = ['${CUDA_ROOT}/lib', '${CUDA_ROOT}/lib/x64']
CURAND_LIBNAME = ['curand']
CXXFLAGS = ['/EHsc']
LDFLAGS = ['/FORCE']

Set USE_SHIPPED_BOOST to true so that it will look for the boost libraries inside the python directory. I had to set the CXXFLAGS and LDFLAGS as such so that the code would compile. Sounds not so good, I know, but what the heck, it compiles.
Then type,
python setup.py install

It should compile at this stage. Hooray!!!

After the installation:

Edit nvcc.profile (in CUDA\v7.5\bin) (or whereever you installed the Cuda GPU Computing toolkit) and set INCLUDES flag to this:
INCLUDES += "-I$(TOP)/include" "-I$(TOP)/include/cudart" "-IC:/Program Files (x86)/Microsoft Visual Studio 10.0/VC/include" $(_SPACE_)

 

And that is my personal, painful story of PyCUDA Windows Installation.

This page helped me a lot! Thanks Marty! I didn’t need to install the Windows SDK for this installation though.

Installing CUDA 7 on CentOS 7 – The Golden Path for OpenGL Samples to Work

1. Install CentOS 7 – this should be pretty straightforward!

2. Follow this guide to install CUDA 7 : http://developer.download.nvidia.com/compute/cuda/7_0/Prod/doc/CUDA_Getting_Started_Linux.pdf

CUDA installation has to be done in command line mode, no X Windows. So once you go in CentOS 7 GUI, open a terminal and type

$ systemctl set-default multi-user.target

$ reboot

CentOS 7’s default mode now will be to reboot to CLI.

Start the CUDA 7 installation.

Remember to install the OpenGL Libraries. Read 4.2 onwards carefully! Follow all the steps.

3. Disable the nouveau drivers (as you are installing NVIDIA drivers)

Create a file at /etc/modprobe.d/blacklist-nouveau.conf with the following contents:

$ blacklist nouveau

$ options nouveau modeset=0

4. Regenerate the kernel initramfs

$ dracut --force

5. Run nvidia-xconfig to recreate the config file for X Windows.

$ nvidia-xconfig

$ reboot

6. Set the library paths for CUDA 7 libraries on boot       

$ cd  /etc/profile.d

$ vim cudapaths.sh

Type the following into the cudapaths.sh script.

export LD_LIBRARY_PATH=/usr/local/cuda-7.0/lib64:$LD_LIBRARY_PATH

Save and reboot the OS.

7. Install 3rd Party Libraries (for GL)

If you try to compile one of the projects under 3-Imaging, you will get a lot of lib* not found errors. You got to install the libraries manually.

$ yum install mesa-libGLES.x86_64 mesa-libGL-devel.x86_64 mesa-libGLU-devel.x86_64 mesa-libGLw.x86_64 mesa-libGLw-devel.x86_64 libXi-devel.x86_64 freeglut-devel.x86_64 freeglut.x86_64

8. Reinstall NVIDIA Drivers to Fix Symbolic Links

Due to a bug when installing the 3rd party drivers, you will need to re-run the driver installation to fix some symbolic links. Bug Info: https://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=migr-5094265

Download drivers here: http://www.geforce.com/drivers/results/84721

The next person who tells me installing stuff in Linux is as easy as doing it in Windows/OSX, I am going to punch him in the face.

Installing Nvidia Drivers on RHEL or CentOS 7

http://www.advancedclustering.com/act-kb/installing-nvidia-drivers-rhel-centos-7/

Another article, more descriptive.

http://www.dedoimedo.com/computers/centos-7-nvidia.html

EDIT: You know what ? Fuck the above 2 guides for wasting my time. They’re missing steps here and there.

Particularly this:

If the GPU used for display is an NVIDIA GPU, the X server configuration file, /etc/X11/xorg.conf, may need to be modified. In some cases, nvidia-xconfig can be used to automatically generate a xorg.conf file that works for the system. For non-standard systems, such as those with more than one GPU, it is recommended to manually edit the xorg.conf file. Consult the xorg.conf documentation for more information.

So there you have it – after installing the CUDA 7, run:

/usr/local/cuda-7.0/bin/nvidia-xconfig to generate a new xorg.conf file for X Server. Else your X Windows configuration may not know about the new CUDA 7 driver.

Follow this instead: NVIDIA’s official guide: http://developer.download.nvidia.com/compute/cuda/7_0/Prod/doc/CUDA_Getting_Started_Linux.pdf