Search

Kernel Panic

It may be possible that I don't know anything.

Finding Suitable FFT Numbers

sfft5

Many high quality FFT libraries support fast implementations for small primes. (2,3,5,7). I’ll use CUDA fft here as an example.

When you have say  an array of 18000 samples d_data, you might not want to do this:

nfft = 32768; 
cudafftPlan1d(&m_plan, nfft, CUFFT_C2C, 1); 
cufftExecC2C(m_plan, d_data, d_data, CUFFT_FORWARD);

Especially if you are doing many, many iterations of this.

Instead, you can use a method to find the closest fft number to 18000, that is still made of of small primes (2,3,5,7).

With this findNFFT method, nfft = 18432, which is

2^11 * 3 * 3

In this way, you process less samples, which saves on time. Any gain in speed by using a true radix-2 (32768) number is offset by the number of samples being processed, which is now much less. In one of my projects, this helped to cut the processing time by half.

Here’s the code for the findNFFT method – not the prettiest code but it does the trick.

/**
 * @brief Returns a vector containing the prime factors of n
 *
 * @param [in] The number to find the prime factors for
 * @return
 */
std::vector<int> primeFactors(int n) {
    std::vector<int> vec;

    while (n % 2 == 0) {
        vec.push_back(2);
        n /= 2;
    }

    for (int i = 3; i <= sqrt(n); i += 2) {
        while (n % i == 0) {
            vec.push_back(i);
            n /= i;
        }
    }

    if (n > 2)
        vec.push_back(n);

// std::cout << "Prime factors:" << std::endl;
// for (int j=0; j < vec.size(); j++)
// {
// printf("%d ", vec[j]);
// }
// printf("\n");
    return vec;
}

/**
 * @brief Used to find the appropriate fft integer for the input n
 * This uses the "formula" (N + D - 1)/D * D
 * Criteria: Output nfft should be a factor of 2,3,5
 *
 * @param [in] Integer to find nfft for
 */
int findNFFT(int n) {
    std::vector<int> ansPrimes;
    std::vector<int> firstPrimes;

    int d = 0;

    do {
        if (n > 2048) d = 512;
        else if (n > 1024) d = 256;
        else if (n > 128) d = 64;
        else if (n > 32) d = 32;
        else if (n > 8) d = 8;
        else d = 2;

        int fn = (n + d - 1) / d * d;
        firstPrimes = primeFactors(fn);

        for (int i = 0; i < firstPrimes.size(); i++) {
            if (firstPrimes[i] == 2 || firstPrimes[i] == 3 || firstPrimes[i] == 5) {
                ansPrimes.push_back(firstPrimes[i]);
                firstPrimes.erase(firstPrimes.begin() + i);
                i -= 1;
            }
        }

        int newN = 1;
        if (firstPrimes.size() > 0) {
            for (int i = 0; i < firstPrimes.size(); i++)
                newN *= firstPrimes[i];
        }

        n = newN;
        firstPrimes = {};

    } while (n != 1); // if n == 1 means that firstPrimes

    int ans = 1;
    for (int i = 0; i < ansPrimes.size(); i++)
        ans *= ansPrimes[i];

    return ans;
}

Texas Instruments C6678 Evaluation Kit – Booting Your Program from NAND/NOR

5280-cimg3088-_2d00_-small

If you’ve bought a evaluation kit for the Texas Instruments C6678 evaluation kit, and am wondering how to get your .out program binary to load on boot, and am confused by the manuals, here are a set of “golden path” instructions. (Actually, most of it is cut-and-pasted from various readme.txt files in different directories. It’s just placed here in proper order for normal people like me to understand.)

Procedure to Modify I2C Configuration

For some reason, the evaluation kit ships with the EEPROM’s nandBoot.bootFormat setting at ibl_BOOT_FORMAT_BBLOB. If we are using the .out file created with Code Composer, we need to change this is ibl_BOOT_FORMAT_ELF in order to boot the binary.

You likely only need to do this once to change the I2C configuration. Fortunately, TI has a helper program that makes use of the file i2cConfig.gel to change the I2C program.

- Modify the i2cConfig.gel file to your needs. It is located at C:\ti\mcsdk_2_01_02_05\tools\boot_loader\ibl\src\util\i2cConfig\i2cConfig.gel
 - Look for: setConfig_c6678_main() , this is the function you will calll later to set the i2c config
 - Look for: ibl.bootModes[1].u.nandBoot.bootFormat : If this is ibl_BOOT_FORMAT_BBLOB, change to ibl_BOOT_FORMAT_ELF;

- Load NO BOOT as usual , Launch Selected Configuration
 Set the dip switches (pin1, pin2, pin3, pin4) to:
 SW3(off, on, on, on),
 SW4(on, on, on, on),
 SW5(on, on, on, on),
 SW6(on, on, on, on)
 - Connect Target on Core 0
 - Load C:\ti\mcsdk_2_01_02_05\tools\boot_loader\ibl\src\make\bin\i2cparam_0x51_c6678_le_0x500.out program
 - Gel Files -> Load Gel File C:\ti\mcsdk_2_01_02_05\tools\boot_loader\ibl\src\util\i2cConfig\i2cConfig.gel
 - Scripts -> evm C6678 IBL -> setConfig_c6678_main()
 - Done

Procedure to Load NOR

NOR Writer Utility

NOR Writer is a simple utility to program a CCS format image/data file to the NOR flash.

Steps to program the NOR:

1. Be sure to set the boot mode dip switch to no boot/EMIF16 boot mode on the EVM.

2. Copy the binary file to writer\nor\evmc66xxl\bin directory, and rename it to app.bin.

3. Change the file_name and start_addr in writer\nor\evmc66xxl\bin\norwriter_input.txt if necessary.
 By default the NOR writer will load app.bin to DSP memory and write the data to NOR device start byte address 0,
 the start_addr should always be set to the start byte addess of a sector.

4. Open CCSv5 and launch the evmc66xx emulator target configuration and connect to core 0.

5. Load the program writer\nor\evmc66xxl\bin\norwriter_evm66xxl.out to CCS, be sure evmc66xxl.gel is used in CCS
 and DDR is intialized.

6. Open the Memory view (in CCSv5, view->Memory Browser), and view the memory address 0x80000000.

7. Load app.bin to 0x80000000:
 * In CCSv5, right click mouse in memory window, select "load memory".
 * Browse and select writer\nor\evmc66xxl\bin\app.bin (raw data format), click "next"
 * Set the Start Address to "0x80000000", Type-size to 32-bits, leave swap unchecked, click "finish"
 8. After the binary file is loaded into the memory, run the program (in CCSv5, press F8), it will start to program the
 NOR.

9. When programming is completed, the console will print "NOR programming completed successfully", if there
 is any error, the console will show the error message.

NOR Boot:
 Set the dip switches (pin1, pin2, pin3, pin4) to:
 SW3(off, off, on, off),
 SW4(on, on, on, on),
 SW5(on, on, on, off),
 SW6(on, on, on, on)
 This will set the boot param index to 0 to boot the NOR image, by default
 the boot configuration table sets the NOR offset address to be 0 and
 image format to be ELF for image 0.

 Procedure to Load NAND

– Programming the application on NAND or NOR flash
NOTE: This step is not needed if the application is booted from Ethernet.
(a) Use the NAND or NOR writer c6678 EVM from the tools directory.
(a) Flash the Application to NAND or NOR. For instructions please follow
the instructions given along with the NAND/NOR writer.

For all the I2C boot modes, user needs to set the boot dip switches to I2C master, bus address 0x51.
NAND Writer Utility

NAND Writer is a simple utility to program a CCS format image/data file to the NAND flash.

Steps to program the NAND:

1. Be sure to set the boot mode dip switch to no boot/EMIF16 boot mode on the EVM.

2. Copy the binary file to writer\nand\evmc66xxl\bin directory, and rename it to app.bin.

3. Change the file_name and start_addr in writer\nand\evmc66xxl\bin\nandwriter_input.txt if necessary.
 By default the NAND writer will load app.bin to DSP memory and write the data to NAND device start byte address 16384
 (start address of block 1). The start_addr should always be set to the start byte addess of a block.

4. Open CCSv5 and launch the evmc66xx emulator target configuration and connect to core 0.

5. Load the program writer\nand\evmc66xxl\bin\nandwriter_evm66xxl.out to CCS, be sure evmc66xxl.gel is used in CCS
 and DDR is intialized.

6. Open the Memory view (in CCSv5, view->Memory Browser), and view the memory address 0x80000000.

7. Load app.bin to 0x80000000:
 * In CCSv5, right click mouse in memory window, select "load memory".
 * Browse and select writer\nand\evmc66xxl\bin\app.bin (raw data format), click "next"
 * Set the Start Address to "0x80000000", Type-size to 32-bits, leave swap unchecked, click "finish"

8. After the data file is loaded into the memory, run the program (in CCSv5, press F8), it will start to program the
 NAND.

9. When programming is completed, the console will print "NAND programming completed successfully", if there
 is any error, the console will show the error message.

NAND Boot:
 Set the dip switches (pin1, pin2, pin3, pin4) to:
 SW3(off, off, on, off),
 SW4(on, off, on, on),
 SW5(on, on, on, off),
 SW6(on, on, on, on)
 This will set the boot param index to 2 to boot the NAND image, by default
 the boot configuration table sets the NAND offset address to be 16384
 (start of block 1).

Useful Stackoverflow Post for Rounding Numbers

http://stackoverflow.com/questions/1343890/rounding-number-to-2-decimal-places-in-c

Note Durations for Music Apps

I’ll just leave this here. Useful if you need to calculate the note values if you doing say, an arpeggiator.😉

Half note               =  120 / BPM
Quarter note            =   60 / BPM
Eighth note             =   30 / BPM
Sixteenth note          =   15 / BPM
Dotted-quarter note     =   90 / BPM
Dotted-eighth note      =   45 / BPM
Dotted-sixteenth note   = 22.5 / BPM
Triplet-quarter note    =   40 / BPM
Triplet-eighth note     =   20 / BPM
Triplet-sixteenth note  =   10 / BPM

Ten Reasons to Learn Python

python_640x400

    1. It’s Matlab!

      Users of Matlab will find the Python console familiar. For Matlab-esque applications, numpy and scipy are easily available for Python. matplotlib covers the graphing stuff that you need.

      Lazy to install all of this stuff? Just download WinPython or Anaconda – these are all-in-one installers designed to work out of the box for your scientific needs. To me, Python represents the glue between the academic and the developer world. Python is likely to chip away at Matlab slowly but surely for one reason – newly graduated computer science students are more likely to already know Python. It is one of the favourite languages used to teach.

    2. It’s a Scripting Language!

      In the course of my work, I frequently have to quickly script up simulators or quick interface testers to test production code. This is where Python comes into play. The sheer convenience of having everything and anything you could ever possibly need for a tester makes Python an easy choice. You can collect the data via the tester and quickly display a graph of the data via matplotlib. If the work is I/O bound rather than compute bound, Python does the job well. This is a pretty good article on what I think is going on.

      However, if it is compute bound and you really need the speediest, look to C/C++. Or you may try one of the alternative Python interpreters, Pypy. The main aim of Pypy is to improve upon the default Python interpreter, CPython.

    3. It’s a Application Programming Language!

      You can make games with PyGame. With PyQt, you can basically write a cross-platform GUI application. Another GUI app development framework which gets a lot of love is wxPython. BitTorrent was developed using wxPython. TkInter is another windowing framework. Another app that was created using Python is Calibre, a capable e-book reader.

    4. It’s a Web Development Language!

      Of course you’ve heard of Django. It’s one of the most popular and powerful web application frameworks. Here are some web Python frameworks, compared.

      Dropbox, YouTube, Google, Quora, Reddit, Pinterest, Spotify – they all use Python. Need I say more?

    5. It’s a Calculator!

      Python has basically replaced the calculator on my desktop for any advanced calculations. All I need to do is to start the terminal and :

      >python
      >>> import numpy as np
      >>> np.sqrt(6)
      2.4494897427831779

      Ok, not so advanced, but you get the idea. Cosine, sine, arctangent, matrices, they’re all there. And it’s right there in the terminal.

    6.  Batteries included!

      For ZeroMQ messaging protocol – You have PyZMQ.
      For GPU programming (CUDA-based), you have Continuum Analytics’s Accelerate GPU solution and PyCUDA here.
      For QT-based GUIs, you can use PyQT and generate GUIs with a few dozen lines of code.

      Anything you can imagine, it probably has a python binding at the very least. Google’s newly-open-sourced TensorFlow for machine learning has Python front and center. There’s a python framework for pretty much anything.

      For programmers, the most important thing is to get shit done, so anytime you can leverage on 3rd party libraries, you’re probably going to want to do it.

      Oh, and it comes with any respectable desktop linux distribution, so you get it out-of-the-box.

    7. Syntactic Sugar !

      Indentation in Python essentially is the curly bracket equivalent in C/C++. This results in code that is generally easier to read and free of the individual programmer’s syntactic quirks. This also results in generally neater code. Other language features like dynamic typing helps a lot in reducing unnecessary syntax. The less code you write, the less probability of error. In fact, Python reminds me of my own pseudo code.
      Being easier to learn, Python reduces the time to market.

      Type :

      >>> import this

      for the Zen of Python – it kind of illustrates what Python is all about.

    8. Python is free, and many of its IDEs are free!

      PyCharm Community is free. This is from one of my favourite IDE developers, JetBrains. Another one is Spyder, which tries to mimic the Matlab programming environment. Spyder comes installed with the WinPython and Anaconda packages. Many of these are cross-platform, and come with modern useful code-completion features.

      Python is kept alive by the following companies. The Python source code is all open-source.  So it probably isn’t going away anytime soon. Python actually is around 26 years old, so yeah, that’s an eternity in computer software.

    9. Show me the money!

      Based on Quartz, here are programming languages listed next to their average annual salary from lowest to highest. Number 3 ain’t bad, considering Objective C is proprietary to Apple. Anyway a lot of Swift seems to be inspired by Python.

      main-qimg-ad55198f335d14ca73d191d08d071651

    10. It’s Everywhere!

      It truly is, and it will get even more pervasive as more students graduate with Python in their pocket.

 

 

 

Valgrind – Suppressing CUDA/ZMQ/IPP/OpenMP Errors

Valgrind is great, but it doesn’t recognize some calls like CUDA’s. This means that Valgrind frequently reports these as leaks, even when they are legitimate calls. Even a simple cudaFree() can cause Valgrind to complain.

You call the suppression file like this:

> valgrind --suppressions=ippcuda.supp ./yourProgram

What is does is to basically ignore these errors that valgrind complains about. The downside is that you may miss some legitimate leaks in CUDA, for example. For that, you can use cuda-memcheck , but that is really, really slow. Boo NVIDIA.

##----------------------------------------------------------------------##
# ZMQ Suppresions

{
<socketcall_sendto>
Memcheck:Param
socketcall.sendto(msg)
fun:send
...
}
{
<socketcall_sendto>
Memcheck:Param
socketcall.send(msg)
fun:send
...
}

##----------------------------------------------------------------------##
# Intel Suppresions

{
 <insert_a_suppression_name_here>
 Memcheck:Cond
 fun:__intel_sse2_strrchr
 fun:_init
 fun:_dl_init
 obj:/usr/lib64/ld-2.17.so
 obj:*
 obj:*
 obj:*
}

{
 <insert_a_suppression_name_here>
 Memcheck:Cond
 fun:__intel_sse2_strrchr
 fun:DynReload
 fun:ippSetCpuFeatures
 fun:_init
 fun:_dl_init
 obj:/usr/lib64/ld-2.17.so
 obj:*
 obj:*
 obj:*
}

{
 <insert_a_suppression_name_here>
 Memcheck:Cond
 fun:__intel_sse2_strrchr
 fun:DynReload
 fun:ippSetCpuFeatures
 fun:main
}


{
 <insert_a_suppression_name_here>
 Memcheck:Cond
 fun:__intel_sse2_strrchr
 fun:_init
 fun:_dl_init
 obj:/usr/lib64/ld-2.17.so
}

{
 <insert_a_suppression_name_here>
 Memcheck:Cond
 fun:__intel_sse2_strrchr
 fun:DynReload
 fun:ippSetCpuFeatures
 fun:_init
 fun:_dl_init
 obj:/usr/lib64/ld-2.17.so
}

{
 <insert_a_suppression_name_here>
 Memcheck:Cond
 fun:__intel_sse2_strrchr
 fun:DynReload
 fun:ippInit
 fun:main
}

##----------------------------------------------------------------------##
# OMP Suppresions

{
 <insert_a_suppression_name_here>
 Memcheck:Leak
 match-leak-kinds: possible
 fun:calloc
 fun:_dl_allocate_tls
 fun:pthread_create@@GLIBC_2.2.5
 obj:/usr/lib64/libgomp.so.1.0.0
 fun:_ZN9AutoFocus8startPGAEP7Ipp32fcfiiPf
 fun:main
}

##----------------------------------------------------------------------##
# CUDA Suppresions

{
 <alloc_libcuda>
 Memcheck:Leak
 match-leak-kinds: reachable,possible
 fun:*alloc
 ...
 obj:*libcuda.so*
 ...
}

{
 <alloc_libcuda>
 Memcheck:Leak
 match-leak-kinds: reachable,possible
 fun:*alloc
 ...
 obj:*libcufft.so*
 ...
}

{
 <alloc_libcudart>
 Memcheck:Leak
 match-leak-kinds: reachable,possible
 fun:*alloc
 ...
 obj:*libcudart.so*
 ...
}

If this doesn’t suit you, you can print your own suppressions using.

./valgrind --gen-suppressions=yes ./yourprogram

Valgrind will then generate suppressions catering to the particular error you have.
Don’t go around suppressing legit valgrind leak detections though!

Here’s my own suppression file which includes the above, hope it helps.
https://onedrive.live.com/redir?resid=692F268A60881F2D!22968&authkey=!ANsb8IMA9e8lkOw&ithint=file%2csupp

Valgrind – Dealing with IPP / AVX Related False Positives

Update: Valgrind 3.11 doesn’t show the AVX errors mentioned below. So if you have the option, upgrading it to 3.11 is probably the better option. 

Debugging Intel IPP-enabled C/C++ programs with Valgrind, you may run into the following issues.

Process terminating with default action of signal 4 (SIGILL)

Illegal opcode at address 0xEBC9CD4

at 0xEBC9CD4 : own_ipps_sAtan2_E9LAynn (in /opt/intel/compilers_and_libraries_2016.0.109/linux/ipp/lib/intel64_lin/libippvme9.so.9.0

The program terminates because of this apparently “illegal opcode” that valgrind doesn’t recognize.

This is ok. It’s just that Valgrind doesn’t recognize certain AVX opcodes.
If you want Valgrind to proceed anyway, do this:

From the IPP Manual, you can find this:

http://www.hpc.ut.ee/dokumendid/ips_xe_2015/composerxe/Documentation/en_US/ipp/ipp_manual/GUID-C730D3B1-6232-45AF-A757-DF52850388CD.htm

32-bit code:

#define PX_FM ( ippCPUID_MMX | ippCPUID_SSE )
#define W7_FM ( PX_FM | ippCPUID_SSE2 )
#define V8_FM ( W7_FM | ippCPUID_SSE3 | ippCPUID_SSSE3 )
#define S8_FM ( V8_FM | ippCPUID_MOVBE )
#define P8_FM ( V8_FM | ippCPUID_SSE41 | ippCPUID_SSE42 | ippCPUID_AES | ippCPUID_CLMUL | ippCPUID_SHA )
#define G9_FM ( P8_FM | ippCPUID_AVX | ippAVX_ENABLEDBYOS | ippCPUID_RDRRAND | ippCPUID_F16C )
#define H9_FM ( G9_FM | ippCPUID_AVX2 | ippCPUID_MOVBE | ippCPUID_ADCOX | ippCPUID_RDSEED | ippCPUID_PREFETCHW )
64-bit code:

#define PX_FM ( ippCPUID_MMX | ippCPUID_SSE | ippCPUID_SSE2 )
#define M7_FM ( PX_FM | ippCPUID_SSE3 )
#define N8_FM ( S8_FM )
#define U8_FM ( V8_FM )
#define Y8_FM ( P8_FM )
#define E9_FM ( G9_FM )
#define L9_FM ( H9_FM )

Copy and paste these on the top of your code. Just until P8_FM for this case will do. So you can actually “use” P8_FM, which essential means that ipp will use the SSE type instructions and will avoid the AVX types.

#define PX_FM ( ippCPUID_MMX | ippCPUID_SSE )
#define W7_FM ( PX_FM | ippCPUID_SSE2 )
#define V8_FM ( W7_FM | ippCPUID_SSE3 | ippCPUID_SSSE3 )
#define S8_FM ( V8_FM | ippCPUID_MOVBE )
#define P8_FM ( V8_FM | ippCPUID_SSE41 | ippCPUID_SSE42 | ippCPUID_AES | ippCPUID_CLMUL | ippCPUID_SHA )

// then in your main()
ippInit();
 
ippSetCpuFeatures(P8_FM) // -- purely to deal with valgrind false positives. Comment out if you want maximum performance using AVX.

I’m using the Valgrind that comes with CentOS 7, 3.10.0. Do let me know if this has been fixed in 3.11.🙂

CentOS7 – Setting Static IP (Persistent)

Say your device name is ifcfg-eth0
Edit/create  /etc/sysconfig/network-scripts/ifcfg-eth0, enter:


# cat /etc/sysconfig/network-scripts/ifcfg-eth0

Sample static ip configuration:

DEVICE=eth0
BOOTPROTO=static
DHCPCLASS=
HWADDR=00:30:48:56:A6:2E
IPADDR=192.168.1.10
NETMASK=255.255.255.0
ONBOOT=yes

Reboot, and type ifconfig – you should see the network being assigned the static ip.

More Efficient ifftshift / fftshift in C++

matlablogo

Previously I touched upon ifftshift and fftshift in this post. https://kerpanic.wordpress.com/2016/01/15/matlab-circshift-equivalent-in-c-c/ . I’ve come to realize that there are better ways to implement fftshift and iftshift using simple memory swapping. Note that you have to pre-allocate the appropriate amount of memory for T*out before calling this function.

//-- Does 1D fftshift 
template<typename T>
inline void fftshift1D(T *in, T *out, int ydim)
{
 int pivot = (ydim % 2 == 0) ? (ydim / 2) : ((ydim - 1) / 2);
 int rightHalf = ydim-pivot;
 int leftHalf = pivot;
 memcpy(out, in+(pivot), sizeof(T)*rightHalf);
 memcpy(out+rightHalf, in, sizeof(T)*leftHalf);
}

//-- Does 1D ifftshift
//-- Note: T* out must already by memory allocated!!
template<typename T>
inline void ifftshift1D(T *in, T *out, int ydim)
{
 int pivot = (ydim % 2 == 0) ? (ydim / 2) : ((ydim + 1) / 2);

 int rightHalf = ydim-pivot;
 int leftHalf = pivot;
 memcpy(out, in+(pivot), sizeof(T)*rightHalf);
 memcpy(out+rightHalf, in, sizeof(T)*leftHalf);
}

 

Hope this helps! It is good as  a Matlab C++ equivalent.

If you want to do 2D you will have to fftshift first, transpose the matrix, then fftshift it again. Then transpose back to the original matrix form.

Also, a better circshift would be to use C++’s in-built std::rotate

template<typename T>
inline void circshift1D_IP(T *in, int ydim, int yshift)
{
 if (yshift == 0)
 return;

 if (yshift > 0) // shift right
 std::rotate(&in[0], &in[ydim - yshift - 1], &in[ydim - 1]);
 else if (yshift < 0) // shift left
 {
 yshift = abs(yshift);
 std::rotate(&in[0], &in[yshift], &in[ydim - 1]);
 }

 return;
}

Create a free website or blog at WordPress.com.

Up ↑