Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog. Kunal

Cuda Programming Guide

(Working through the revamped cuda programming notes in parallel with PMPP)

Introduction

Cuda programming

#define CUDA_CHECK(expr_to_check) do {                      \
  cudaError_t result = expr_to_check;                       \
  if (result != cudaSuccess) {                              \
    fprintf(stderr, "CUDA Runtime Error: %s:%i:%d = %s\n",  \
    __FILE__, __LINE__, result,cudaGetErrorString(result)); \
  }                                                         \
} while(0)
memory type scope lifetime location notes
global grid application device primary memory, careful about data races
constant grid application device - __constant__ specifier outside any function, typically 64kb
shared block kernel sm - uses same resource as l1 cache, user scratchpad;
- get device properties for size
- cudaFuncSetCacheConfig to customize allocation
- static: __shared__ float sharedArray[1337]
- dynamic: extern __shared__ float sharedArray[]
+ fn<<<grid, block, sharedmembytes>>>
- must be manually partitioned & aligned for multiple
local thread kernel device - physically in global space
- consecutive 32 bit words are accessed by consecutive thread ids
- accesses are coalesced if threads access same relative addrs
register thread kernel sm - managed by compiler; regsPerMultiprocessor, regsPerBlock