|
HIP: Heterogenous-computing Interface for Portability
|
| Term | CUDA | HIP | OpenCL |
|---|---|---|---|
| Device | int deviceId | int deviceId | cl_device |
| Queue | cudaStream_t | hipStream_t | cl_command_queue |
| Event | cudaEvent_t | hipEvent_t | cl_event |
| Memory | void * | void * | cl_mem |
| grid | grid | NDRange | |
| block | block | work-group | |
| thread | thread | work-item | |
| warp | warp | sub-group | |
| Thread- index | threadIdx.x | threadIdx.x | get_local_id(0) |
| Block- index | blockIdx.x | blockIdx.x | get_group_id(0) |
| Block- dim | blockDim.x | blockDim.x | get_local_size(0) |
| Grid-dim | gridDim.x | gridDim.x | get_num_groups(0) |
| Device Kernel | __global__ | __global__ | __kernel |
| Device Function | __device__ | __device__ | Implied in device compilation |
| Host Function | __host_ (default) | __host_ (default) | Implied in host compilation |
| Host + Device Function | __host__ __device__ | __host__ __device__ | No equivalent |
| Kernel Launch | <<< >>> | hipLaunchKernel/hipLaunchKernelGGL/<<< >>> | clEnqueueNDRangeKernel |
|||||| |Global Memory|__global__|__global__|__global| |Group Memory|__shared__|__shared__|__local| |Constant|__constant__|__constant__|__constant| |||||| ||__syncthreads|__syncthreads|barrier(CLK_LOCAL_MEMFENCE)| |Atomic Builtins|atomicAdd|atomicAdd|atomic_add| |Precise Math|cos(f)|cos(f)|cos(f)| |Fast Math|__cos(f)|__cos(f)|native_cos(f)| |Vector|float4|float4|float4|
The indexing functions (starting with thread-index) show the terminology for a 1D grid. Some APIs use reverse order of xyz / 012 indexing for 3D grids.
1.8.18