learning plus: CUDA @ NVIDIA hardware driver

2010年10月15日星期五

CUDA @ NVIDIA hardware driver

除了透過 OpenGL 來做2D/3D 硬體加速控制外. NVIDIA 也有一套類似 OpenGL 的 software (hardware)control 叫 "CUDA",只能說這是家很有個性的公司,因為連 Linux 的 driver 也要透過 wrapper 來溝通,這邊姑且不論 wrapper 的架構.我們就以 GPU 的觀點來看,在 GPU 的 design 上,主要是處理影像的運算(float point),如 vertex, ploy, text...的type.所以內部可分成很多的 ALU macros 跟 private/share cache ...可以 parallel or pipeline 的架構處理大量的 DSP 運算.而 CUDA 可以用 software 的語法描述來直接控制 hardware macros.

Refs CUDA tutorial pycuda @ python CUDA Training ps: 沒有 NVIDIA 的顯卡,所以只好無聊寫個 ALU macro emulator...XD 大致上是有 3 個 independent 的 ALU macros, 每個 ALU 的 status 會存入 queue 中, 而 schedule management 會去 ask work queue, 判斷每個 ALU macro 的狀態, 如果 ALU macro = IDLE, schedule management 就會把 command assign 給這個 ALU macro. sample part @ code

void *check_work_queue(void *t){
int i;

  for(;;){
   //check 4 each ALU in work_queue
   pthread_mutex_lock(&count_mutex);
   gid = -1;
   for(i=0; i<3; i++){
       if( is_work_queue_exist(&qulist,i) == QUEUE_ER_ARB ){
            gid = i;
            break;
        }
    }

 //   dump_work_queue(qulist);
   pthread_mutex_unlock(&count_mutex);

   sleep(ALU_DEF_DELAY);

     if( DONE == ALU_OK)
         pthread_exit(NULL);
  }
}

Results

time @ Fri Oct 15 16:51:49 2010

assign ALU 0
command :     0,in_a :     3,in_b :     4

time @ Fri Oct 15 16:51:51 2010
queue :: id  0,cmd :     0,in_a :     3,in_b :     4, (pre)out_c :     7

assign ALU 1
command :     1,in_a :     3,in_b :     4

time @ Fri Oct 15 16:51:53 2010
queue :: id  1,cmd :     1,in_a :     3,in_b :     4, (pre)out_c :    -1

assign ALU 1
command :     2,in_a :     3,in_b :     4

time @ Fri Oct 15 16:51:55 2010
queue :: id  0,cmd :     0,in_a :     3,in_b :     4, (pre)out_c :     7
queue :: id  1,cmd :     1,in_a :     3,in_b :     4, (pre)out_c :    -1

assign ALU 2

project download here