2011年3月25日 星期五

C to Verilog notes

底下列出目前在 C to Verilog 所看到的 features. @ Pre-Synthesis part 1. Parallel support Ref : parallel synthesis @ llvm 2. reduce redundant nodes
ex: original
a = 3+4;        
b = b+a;        
ex: new
// cut instruction "a = 3+4" & move the new point to next instruction
b = b+7
3. reduce redundant bit width
ex: original
int32_t a,c;
a =3;
c = 32'b(a)+32'b(1);
ex: new
int8_t a;
int9_t c;
a =3;
c = 8'b(a)+8'b(1);
4. remove Allocation
ex: original

void A(){
...
B (a,b,&c);
}

void B(int a,int b, int *c){
*c = a+b;
}
ex: new
void A(){
...
c = a+b;
}
5. instruction priority
c=a+b; instruction 1
d=c+1; instruction 2
priority 1 > 2
... 
6. PHI node ignore. @ constrain at each Basic Block the data had already done.
ps: @ preprocessor Basic Block the PHI in-alive Nodes had already done.
ex: 
%i.08 = phi i32 [ 0, %entry ], [ %inc, %popCnt.exit ]
%arrayidx = getelementptr i32* %B, i32 %i.08
...

//0    @ Basic block entry
//%inc @ Basic block popCnt.exit & replace it be %i.08 
7. reduce redundant instructions
@ PHI part
%i.06.i = phi i32 [ 0, %for.body ], [ %inc.i, %for.body.i ]
%i.07.i = phi i32 [ 0, %for.body ], [ %inc.i, %for.body.i ]

// cut %i.07.i instruction

@ Operand   
%add.01.i = add i32 %and.i, %sum.07.i
%add.02.i = add i32 %and.i, %sum.07.i

// cut %add.02.i instruction
8. loop with hardware resource constrain "2" Add.
ex : original
for(int i=0; i<10; i++)
   a[i] = b[i]+1;

...
ex : new
for(int i=0; i<10; i=i+2){
    a[i]   = b[i]+1;
    a[i+1] = b[i+1]+1;
}

...
@ Synthesis part 1. Global Value @ In Verilog Module not support the local value
ex : @ Verilog
 
module xx(...);
input ...
output ...
reg ...
wire ...

...

always@(posedge clk)begin
...
end

...
endmodule
2. Schedule list external memory support. split the memory load instruction into two Basic blocks @ Memory request(address/mode)block , Memory (wait the require data)read block. ps: each Basic Block should be done at one cycle clock.
ex: original
for.body:
// @ address/mode phase
%arrayidx = getelementptr i32* %B, i32 %i.08

// @ data phase
%tmp3 = load i32* %arrayidx, align 4, !tbaa !0
...

ex: new

for.body.phase1:
%arrayidx = getelementptr i32* %B, i32 %i.08
br label %for.body.phase2

for,body.phase2:
%tmp3 = load i32* %arrayidx, align 4, !tbaa !0

@ Verilog view
module(...)
output [31:0] mem_address;
output [0 :0] mem_mode;
output [31:0] mem_store_data;
input  [31:0] mem_load_data;

reg [3 :0] cur_state,nxt_state;
reg [31:0] a;
....

case(cur_state)
for_body_phase1 : begin 
                        mem_address = 0x00000000; 
                        mem_mode    = read;
  
                        nxt_state = for_body_phase2; 
                  end

for_body_phase2 : begin 
                        a = mem_load_data;
                        
                        nxt_state = alu_phase;
                  end

alu_phase       : 
...

...
endcase
...

cur_state = nxt_state;
3. pipeline support unroll the loop & pipeline insert. @ example case
ex: original

%cat test.c

static inline unsigned int popCnt(unsigned int input) { 
    unsigned int sum = 0; 
    for (int i = 0; i < 32; i++) {
        sum += (input) & 1; 
        input = input/2; 
    } 
    return sum; 
} 
how to unroll loop ?
// step1. gen IR bytecode
%clang -O3 -emit-llvm test.c -e -o test.bc

// step2. unroll the loop in bytecode
// you can check the opt by "opt -help | grep loop"
%opt -loop-unroll -unroll-count 20 test.bc -debug | llvm-dis

// step3. if step2 pass, then output new bytecode
%opt -loop-unroll -unroll-count 20 test.bc > opt_test.bc 
un-pipeline IR & view

%llvm-dis opt_test.bc > opt_test.ll

%cat opt_test.ll

for.body.i.1: ...Instruction A   br label %for.body.i.2
for.body.i.2: ...Instruction B   br label %for.body.i.3
for.body.i.3: ...Instruction C   br label %for.body.i.4
for.body.i.4: ...Instruction D   br label %for.body.i.5
...
pipeline it
for.body.i.1 : ...Instruction A         br label %for.body.i.2
for.body.i.2 : ...Instruction B,A       br label %for.body.i.3
for.body.i.3 : ...Instruction C,B,A     br label %for.body.i.4
for.body.i.4 : ...Instruction D,C,B,A   br label %for.body.i.5
for.body.i.6 : ...Instruction   D,C,B   br label %for.body.i.7
for.body.i.8 : ...Instruction     D,C   br label %for.body.i.9
for.body.i.10: ...Instruction       D   br label %for.exit.i.0

for.exit.i.0 :
...

沒有留言:

張貼留言