Harnessing the power locked in GPUs

03 Feb 2009 20:24 IST

New Update

BANGALORE, INDIA: Before explaining the implementation of CUDA, let's refresh what the difference between a CPU and a GPU is. CPUs are designed to carry on serial tasks whereas GPUs can process in parallel. CUDA is a software platform to use GPUs for parallel high performance processing.

Advertisment

Just look at the amazing video on 'Youtube,' that compares the working of a CPU and a GPU by the popular serial on Discovery: Myth Busters' (in.). CUDA is small set of extensions to C language that enables implementation of parallel algorithms; GPUs have hundreds of cores with shared resources, FPUs, and registers that can run threads in parallel.

CUDA includes C/C++ software development tools, function libraries, and a hardware abstraction mechanism that hides the GPU hardware from developers. CUDA works along with conventional C/C++ compilers making it possible to mix GPU code with the general purpose CPU code.

Installation
In this section we would help you to install CUDA on your machine. You first need a CUDA enabled graphics cards. In our implementation we are using 'GeForce 9600 GT' graphics card. More information on CUDA enabled products and development with CUDA can be obtained from the web page: www.nvidia.com/cuda. Now the next step is to download CUDA software. On the same link click on 'DOWNLOAD CUDA 'and enter your operating system (Windows XP in our case). We would be using CUDA 2.0; there are three things to be downloaded: CUDA drivers, CUDA tool kit and CUDA SDK.

CUDA 2.0 supports version 177.35 or later NVIDIA ForceWare graphics drivers for Windows XP. To check the version of drivers on your machine go to 'NIVIDA Control Panel' and click on 'Help>System Information'. If the version of derivers is lower than 177.35 then download and install CUDA drivers. The next thing to be downloaded and installed is CUDA toolkit. This contains tool needed to compile and build CUDA applications.

Finally one can download and install CUDA SDK for sample projects. To verify installation run 'bandwidthTest' program present at 'C:\Program Files\NVIDIA Corporation\NVIDIA CUDASDK\bin\ win32\ Release.' If properly installed, the output window will show 'Test PASSED' in the second last line and the name of graphics card in the first line. To test the version of CUDA drivers, open command prompt and type 'nvcc -V'. Besides these CUDA software one can also use Microsoft's Visual Studio 2005 for developing C/C++ applications.

CUDA platform for parallel processing on NVIDIA GPUs. Here one can see how GPU hardware is abstracted from application developers.

Advertisment

Programming with CUDA
We throw some light on how programming is done for CUDA. It extends C by allowing programmers to define C functions known as 'kernels'. When these kernels are called, they execute n times (in parallel) in n different threads. Here is the code snippet to define a kernel:

__global__ void matAdd(float A, float B,
float C)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
if (i < N && j < N)
C = A + B;
}
int main()
{
// Kernel invocation
dim3 dimBlock(16, 16);
dim3 dimGrid((N + dimBlock.x – 1) / dimBlock.x,
(N + dimBlock.y – 1) / dimBlock.y);
matAdd<< dimBlock>>>(A, B, C);
}

To check the version of drivers on your machine, go to NVIDIA Control Panel. Click on 'Help>System Information' and check for 'ForceWare Version.'

Here kernel is defined using '_global_', and number of threats are define inside a new syntax <<<...>>>. Each of the thread that executes a kernel is given a unique thread ID that is accessible within kernel through a built in variable' threadIdx' variable. 'threadIdx' is a 3-component vector, therefore it can be identified using one-dimensional, two-dimensional or three-dimensional index forming one/two/three dimensional thread blocks.

While executing, threads can access memory from three different places: private memory of thread, block memory for all threads present in block and global memory. A lot of examples are present in 'C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\projects.' Compile these examples and run them. One can also customize these projects. Before writing codes programmers should analyse their code so that they can create small chunks of data that can be distributed into threads. Also keep in mind that you create sufficient number of threads to optimally utilize GPU power.

After installing CUDA, run 'bandwidthTest' program in 'C:\Program Files\NVIDIA Corporation\NVIDIA CUDASDK\bin\win32\Release.' It should show 'Test PASSED. '

Advertisment

NVIDIA is not the only vendor to provide a programming interface to harness the parallel processing power of a GPU. ATI has also joined them with the release of 'ATI Stream Technology' that runs on ATI graphics cards. We shall be providing more information on this in the near future. So, watch out this space in the coming issues!

tech-news