Programming with CUDA We throw some light on how programming is done for CUDA. It extends C by allowing programmers to define C functions known as 'kernels'. When these kernels are called, they execute n times (in parallel) in n different threads. Here is the code snippet to define a kernel:
__global__ void matAdd(float A[N][N], float B[N][N], float C[N][N]) { int i = blockIdx.x * blockDim.x + threadIdx.x; int j = blockIdx.y * blockDim.y + threadIdx.y; if (i < N && j < N) C[i][j] = A[i][j] + B[i][j]; } int main() { // Kernel invocation dim3 dimBlock(16, 16); dim3 dimGrid((N + dimBlock.x – 1) / dimBlock.x, (N + dimBlock.y – 1) / dimBlock.y); matAdd<< dimBlock>>>(A, B, C); }
To check the version of drivers on your machine, go to NVIDIA Control Panel. Click on 'Help>System Information' and check for 'ForceWare Version.'
Here kernel is defined using '_global_', and number of threats are define inside a new syntax <<<...>>>. Each of the thread that executes a kernel is given a unique thread ID that is accessible within kernel through a built in variable' threadIdx' variable. 'threadIdx' is a 3-component vector, therefore it can be identified using one-dimensional, two-dimensional or three-dimensional index forming one/two/three dimensional thread blocks.
While executing, threads can access memory from three different places: private memory of thread, block memory for all threads present in block and global memory. A lot of examples are present in 'C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\projects.' Compile these examples and run them. One can also customize these projects. Before writing codes programmers should analyse their code so that they can create small chunks of data that can be distributed into threads. Also keep in mind that you create sufficient number of threads to optimally utilize GPU power.
After installing CUDA, run 'bandwidthTest' program in 'C:\Program Files\NVIDIA Corporation\NVIDIA CUDASDK\bin\win32\Release.' It should show 'Test PASSED. '
NVIDIA is not the only vendor to provide a programming interface to harness the parallel processing power of a GPU. ATI has also joined them with the release of 'ATI Stream Technology' that runs on ATI graphics cards. We shall be providing more information on this in the near future. So, watch out this space in the coming issues!
Get most out of your technology infrastructure investments with Dell
About CIOL | Media Kit | Site Map | Contact Us | Help | Write to us | Jobs@CyberMedia | Privacy Policy
Copyright © CyberMedia India Online Ltd. All rights reserved. Usage of content from web site is subject to Terms and Conditions.