Device Info Query
In this post, I show how can query to device information with Cuda. In this post, I will use the NVDIA Jetson Xavier Device. However, the below code can run any compatible device with CUDA.
Xavier Specification is:
Device 0 | Xavier |
---|---|
CUDA Driver Version / Runtime Version | 10.0 / 10.0 |
CUDA Capability Major/Minor version number | 7.2 |
Total amount of global memory | 15.32 MBytes (16454500352 bytes) |
GPU Clock rate | 1500 MHz (1.50 GHz) |
Memory Clock rate | 1377 Mhz |
Memory Bus Width | 256-bit |
L2 Cache Size | 524288 bytes |
Max Texture Dimension Size (x) | 1D=(131072) |
Max Texture Dimension Size (x,y) | 2D=(131072,65536) |
Max Texture Dimension Size (x,y,z) | 3D=(16384,16384,16384) |
Max Layered Texture Size (dim) x layers (1D) | 1D=(32768) x 2048 |
Max Layered Texture Size (dim) x layers (2D) | 2D=(32768,32768) x 2048 |
Total amount of constant memory | 65536 bytes |
Total amount of constant memory | 64.00 KB |
Total amount of shared memory per block | 49152 bytes |
Total amount of shared memory per block | 48.00 KB |
Total number of registers available per block | 65536 |
Warp size | 32 |
Maximum number of threads per multiprocessor | 2048 |
Number of multiprocessors | 4 |
Maximum number of threads per block | 1024 |
Maximum number of warps per multiprocessor | 64 |
Maximum sizes of each dimension of a block | 1024 x 1024 x 64 |
Maximum sizes of each dimension of a grid | 2147483647 x 65535 x 65535 |
Maximum memory pitch | 2147483647 bytes |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
#include <cuda_runtime.h> #include <stdio.h> #define CHECK(call) \ { \ const cudaError_t error = call; \ if (error != cudaSuccess) \ { \ fprintf(stderr, "Error: %s:%d, ", __FILE__, __LINE__); \ fprintf(stderr, "code: %d, reason: %s\n", error, \ cudaGetErrorString(error)); \ } \ } int main(int argc, char **argv) { printf("%s Starting...\n", argv[0]); int deviceCount = 0; cudaGetDeviceCount(&deviceCount); if (deviceCount == 0) { printf("There are no available device(s) that support CUDA\n"); } else { printf("Detected %d CUDA Capable device(s)\n", deviceCount); } int dev = 0, driverVersion = 0, runtimeVersion = 0; CHECK(cudaSetDevice(dev)); cudaDeviceProp deviceProp; CHECK(cudaGetDeviceProperties(&deviceProp, dev)); printf("Device %d: \"%s\"\n", dev, deviceProp.name); cudaDriverGetVersion(&driverVersion); cudaRuntimeGetVersion(&runtimeVersion); printf(" CUDA Driver Version / Runtime Version %d.%d / %d.%d\n", driverVersion / 1000, (driverVersion % 100) / 10, runtimeVersion / 1000, (runtimeVersion % 100) / 10); printf(" CUDA Capability Major/Minor version number: %d.%d\n", deviceProp.major, deviceProp.minor); printf(" Total amount of global memory: %.2f MBytes (%llu " "bytes)\n", (float)deviceProp.totalGlobalMem / pow(1024.0, 3), (unsigned long long)deviceProp.totalGlobalMem); printf(" GPU Clock rate: %.0f MHz (%0.2f " "GHz)\n", deviceProp.clockRate * 1e-3f, deviceProp.clockRate * 1e-6f); printf(" Memory Clock rate: %.0f Mhz\n", deviceProp.memoryClockRate * 1e-3f); printf(" Memory Bus Width: %d-bit\n", deviceProp.memoryBusWidth); if (deviceProp.l2CacheSize) { printf(" L2 Cache Size: %d bytes\n", deviceProp.l2CacheSize); } printf(" Max Texture Dimension Size (x,y,z) 1D=(%d), " "2D=(%d,%d), 3D=(%d,%d,%d)\n", deviceProp.maxTexture1D, deviceProp.maxTexture2D[0], deviceProp.maxTexture2D[1], deviceProp.maxTexture3D[0], deviceProp.maxTexture3D[1], deviceProp.maxTexture3D[2]); printf(" Max Layered Texture Size (dim) x layers 1D=(%d) x %d, " "2D=(%d,%d) x %d\n", deviceProp.maxTexture1DLayered[0], deviceProp.maxTexture1DLayered[1], deviceProp.maxTexture2DLayered[0], deviceProp.maxTexture2DLayered[1], deviceProp.maxTexture2DLayered[2]); printf(" Total amount of constant memory: %lu bytes\n", deviceProp.totalConstMem); printf(" Total amount of constant memory: %4.2f KB\n", deviceProp.totalConstMem / 1024.0); printf(" Total amount of shared memory per block: %lu bytes\n", deviceProp.sharedMemPerBlock); printf(" Total amount of shared memory per block: %4.2f KB\n", deviceProp.sharedMemPerBlock / 1024.0); printf(" Total number of registers available per block: %d\n", deviceProp.regsPerBlock); printf(" Warp size: %d\n", deviceProp.warpSize); printf(" Maximum number of threads per multiprocessor: %d\n", deviceProp.maxThreadsPerMultiProcessor); printf(" Number of multiprocessors: %d\n", deviceProp.multiProcessorCount); printf(" Maximum number of threads per block: %d\n", deviceProp.maxThreadsPerBlock); printf(" Maximum number of warps per multiprocessor: %d\n", deviceProp.maxThreadsPerMultiProcessor / deviceProp.warpSize); printf(" Maximum sizes of each dimension of a block: %d x %d x %d\n", deviceProp.maxThreadsDim[0], deviceProp.maxThreadsDim[1], deviceProp.maxThreadsDim[2]); printf(" Maximum sizes of each dimension of a grid: %d x %d x %d\n", deviceProp.maxGridSize[0], deviceProp.maxGridSize[1], deviceProp.maxGridSize[2]); printf(" Maximum memory pitch: %lu bytes\n", deviceProp.memPitch); exit(EXIT_SUCCESS); } |
Output of code is
Device 0: “Xavier”
CUDA Driver Version / Runtime Version 10.0 / 10.0
CUDA Capability Major/Minor version number: 7.2
Total amount of global memory: 15.32 MBytes (16454500352 bytes)
GPU Clock rate: 1500 MHz (1.50 GHz)
Memory Clock rate: 1377 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Max Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072,65536), 3D=(16384,16384,16384)
Max Layered Texture Size (dim) x layers 1D=(32768) x 2048, 2D=(32768,32768) x 2048
Total amount of constant memory: 65536 bytes
Total amount of constant memory: 64.00 KB
Total amount of shared memory per block: 49152 bytes
Total amount of shared memory per block: 48.00 KB
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Number of multiprocessors: 4
Maximum number of threads per block: 1024
Maximum number of warps per multiprocessor: 64
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes