learning plus: 2月 2011

2011年2月26日星期六

synthesis flow @ LLVM 2.8 pt1

最近花了點時間研究 LLVM 跟後端的 Hardware Synthesis flow. 感覺有滿多東西可以做的. 如 pipeline DFG, parallel DFG, memory partition, bus interface... 底下就先用"最最最"簡單的例子來講解一下如何透過 LLVM 當前端的 Input ,跟後端的 Hardware Synthesis 是如何結合的. Step1. LLVM environment set 假設 LLVM 的環境都已經 ready. 沒有 ready 可以參考 LLVM 2.8 env set && pass manager set step2. sample hardware code @ c 一個 Image scaling 最常用到的 Hardware sample IP. 就 o_c = (in_a + in_b)>>1;而 (*) pointer 表示從外部的 memory load/store 進來到loacl 的 IP.

//
//#include <iostream>
//
//using namespace std;
//
void test(int *i_a,int *i_b, int *o_c){
 *o_c = (*i_a+*i_b)>>1;
}
//int main(){
//int a=1, b=1;
//int c;
// test(&a,&b,&c);
// cout

step3. compile && LLVM IR gen 利用 clnag compile 出 LLVM 的 byte code, 之後再轉成有點像 machine code 的ll format.

% clang -O3 -emit-llvm test.c -c -o test.bc
% llvm-dis test.bc

在 test.ll 中,可以發現 load/store 的 instructions. 跟我們之前的假設相同. IR format 就請參考 LLVM Language Reference Manual

define void @test(i32* nocapture %i_a, i32* nocapture %i_b, i32* nocapture %o_c) nounwind {
entry:
  %tmp1 = load i32* %i_a, align 4, !tbaa !0
  %tmp3 = load i32* %i_b, align 4, !tbaa !0
  %add = add nsw i32 %tmp3, %tmp1
  %shr = ashr i32 %add, 1
  store i32 %shr, i32* %o_c, align 4, !tbaa !0
  ret void
}

step4. LLVM IR Get && sample Graph Gen 利用 share lib 的方式依序取的 instruction type 跟 Data.建立起我們後端的sample Graph. ps: Graph 中可以定義每個 instruction 的 Delay 跟 hardware resource constrain ..., 再透過 Schedule Algorithm 來決定每個 Node 所在的 Cycle.

假設無 hardware, Memory constrain , Run Schedule Algorithm 後得到以下的結果.分別表示 Memory load 需要 2 個 clock cycle, Memory Store 需要 3 個 clock cycle, 而 Internal IP 只需要 1 個 clock cycle 就可完成.

假設只有一個 Memory 的情況下,變成要分成兩次 memory Load.導致total 的 cycle time 增加.

resource sharing. 減少 Hardware cost.

如果是 for loop 的 case 也可以猜成 pipeline 的型態.

ps: 應該過不久就可以把sample 寫好放上來跟大家一起研究研究摟. Loop optimization Ref: CFG definition http://en.wikipedia.org/wiki/Control_flow_graph Analysis and Representation of Loops Trident: From High-Level Language to Hardware Circuitry - LLVM

LLVM 2.8 env set && pass manager set

1. LLVM 2.8 environment build 這邊我比較偷懶用 git clone 的方式

git clone http://llvm.org/git/llvm.git

2. clang: a C language family front end for LLVM

cd llvm/tools
svn co http://llvm.org/svn/llvm-project/cfe/trunk clang

3. Build LLVM and Clang:

cd build
../llvm/configure
make

如果還要加入 C++ 的 compiler 請參考 Clang C++ support ps: 如果你懶的話.其實用 apt-get install 就可以找到相對應的套件.

apt-get install llvm-gcc-4.2

ps: 不過這只適用於 LLVM 2.7, 因為 GCC 4.3 之後是 GPL3 授權,所以 LLVM 2.8 把llvm-gcc 拿掉,變成要用外卦的 clang 當 front end 來解析 c/c++ 的語法.或者是自行把 GCC 4.2 include 進去. 假設 env path 跟 clang 都已經set好, 底下就用 LLVM 2.8 + pass manager 建立起一個基於 LLVM env 的 back end project.

export PATH=$PATH:/your_llvm_loc/llvm/obj_root/Debug+Asserts/bin

4. project build 可參考 Writing an LLVM Pass跟Creating an LLVM Project, 裡面有詳細的說明. 請記得在 makefile 中加入 LOADABLE_MODULE =1 來產生 share lib.

# Make the shared library become a loadable module so the tools can
# dlopen/dlsym on the resulting library.
LOADABLE_MODULE = 1

不過 Writing an LLVM Pass 的 example 是乎有些 bug, 可以用下面的 patch update.

--- oldHello.cpp        2011-02-26 13:16:52.872094173 +0800
+++ PassHello.cpp       2011-02-26 11:31:27.639147310 +0800
@@ -1,6 +1,7 @@
#include "llvm/Pass.h"
#include "llvm/Function.h"
#include "llvm/Support/raw_ostream.h"
+#include "llvm/ADT/Statistic.h"

using namespace llvm;

@@ -17,6 +18,6 @@
};

char Hello::ID = 0;
-  static RegisterPass X("hello", "Hello World Pass", false, false);
+  static RegisterPass<Hello> X("hello", "Hello World Pass", false, false);
}

後續就請參考 Writing an LLVM Pass 有詳細的說明. ps: 可以參考 llvm/lib/Transform/Hello/ 有比較完整的說明,不然弄半天還弄不出來還真的很XXX.最後記得在 opt -load 時候要把 *.so 的路徑指到所在的 project 下的 Debug+Asserts/lib.不然直接 load 會造成底下的 Error

Error opening 'LLVMHellocc.so': LLVMHellocc.so: cannot open shared object file: No such file or directory
  -load request ignored.

感覺 Debug+Asserts 被 LLVM 的 key word tied 住.要用底下的 command 才會 work

opt -load Debug+Asserts/lib/LLVMHellocc.so -help | grep hello

Ref http://article.gmane.org/gmane.comp.compilers.llvm.devel/31132/match=hello+so ref: http://dcreager.net/2010/02/17/llvm-lto-karmic/ http://llvm.org/docs/doxygen/html/Hello_8cpp.html

2011年2月22日星期二

LLVM + high level synthesis = ???

在之前的 post 中提過 high level synthesis 的技術,如 ILP Scheduling with DVFS constrain @ perl,Force-Directed Scheduling with golden check @ perl, c to Verilog ..... 但這些技術其實跟compiler 多少都有關聯. 但以站在 Hardware Designer 的角度而言, 光從 fronted 到 backed 所要驗證的東西實在是太複雜了, 從 Algorithm define -> architecture map -> code gen -> function check -> time check -> RTL map -> FPGA -> layout -> chip test 這個 run time 少說也要個1-2年. 但以現在的 chip 規模而言, 傳統的是IP Design 已經不能滿足 SOC 的 Platform. 所以在ESE Technical Overview 中提及,如何透過 high level synthesis + platform constrain 的方式, 做 C to FPGA direct map 的方式, 我們只要考慮好最上層的 algorithm 部份, 而中間的 schedule, assign, interface, hardware platform ...的部份就交給 tool set.藉此可解少我們在 IP design 時所花的時間,且達到 top - down 的 design flow. 其實說穿了就是利用 LLVM 的 virtual machine 做 front end 的 parser + schedule + merge + loop unrolling ... 而我們只要專心的把 back end 的 platform 建立起來就可達到快速的系統驗證. ps: 不過這樣好像有一堆RD跟"我"一樣要失業了說...XD

就 LLVM front end 而言,我們不需考慮到 compiler 是如何產生 AST(abstract syntax tree) 跟 node...的東西, 只要知道 LLVM 中間會產生一個 IR 的 file format. 而 IR 可以再轉成不同的平台的 machine code or language code. 且 LLVM 像 GCC 一樣有 plugins 的功能, 可以把自己寫好的 code complier 成 .so 檔. 之後再餵給 LLVM 當 extend lib. 看來又有很多 job 可以做了... 可以參考 LLVM Subsystem Documentation 內的 Writing an LLVM Backend Writing an LLVM Pass ... Ref: http://www.antlr.org/wiki/display/ANTLR3/LLVM http://llvm.org/docs/ProgrammersManual.html#isa basic block gen http://llvm.org/releases/2.6/docs/tutorial/JITTutorial2.html ESE http://www.cecs.uci.edu/~ese/front-end.html Build your own compiler in Ruby with LLVM http://llvmruby.org/wordpress-llvmruby/ polygen grammar for LLVM assembly language. Writing Your Own Toy Compiler Using Flex, Bison and LLVM http://llvm.org/docs/ReleaseNotes.html#externalproj

2011年2月20日星期日

Pattern Generation for Logic BIST

在之前的 post 中有提及過 BIST error fault detected and repaired. 除了透過 vector 的row/column 來detected fault location 之外,最重要的是如何建立出一組最有效率的 test vector 出來.畢竟對 ATE 而言, test time = money cost. 所以在這邊,我們要找出能夠在最短的test time 內,測出符合 test coverage 的 test vector. ps: 用rand 的方式去打 design vector,最後再根據我們所設的 constrain 來找出最好的 test vector. 當然這邊必考慮到當 input table 很大時.所產生出來的 rand number 要夠大,才能達到 Global 的效果.但這也相對的更加耗時跟記憶體的 cost. 所以我們用兩組 rand number 來增加其複雜度. rand_1 = LFSR architecture assign, rand_2 = test vector assign. 藉由 LFSR 跟 test vector 的搭配來達到其複雜度. sample results

Total_Size  7
POLY_0  101100011
SEED_0  00000010
POLY_0[p]->INPUT[p]  6->6
POLY_0[p]->INPUT[p]  7->3
...
HIT_CYCLE
@cycle  2 @hit pattern  2 @vector 0000000101
@cycle  7 @hit pattern  0 @vector 1001101000
@cycle 14 @hit pattern 14 @vector 0010110101
@cycle 20 @hit pattern  7 @vector 1011110011

project: https://github.com/funningboy/BIST

2011年2月5日星期六

3D IC Design Partitioning with Power Consideration pt2

接續 3D IC Design Partitioning with Power Consideration pt1, 底下介紹 iGA-BFS search(Genetic algorithm + BFS area constrain search) GA 部份. 1. 在之前透過 random BFS search 的方式先找出一張 initial partition cluster graph, 如下圖所示, 透過之前的 search 可以避免 GA 要產生初始基因數列時所需要的時間.畢竟用 rand 的方式來打 vector 消耗最多的就是 system time 摟. 得到底下的幾個符合我們constrain design 的基因組後, 再透過 crossover -> mutation -> fitness 的重複 run time 來找出最好的最佳解.