2013年2月7日 星期四

Python with HighPerformance


最近在讀 high performance python, 底下就隨手記錄一下吧...XD
  • test env
    • MacBook 2.0GHz with 4GB RAM. The GPU is a 9400M

    •  
      • pypy(JIT) > pure python (~6times)
      • cython(move math(complex) part to c) > pure python (1.5~30times)
      • numpy(matrix vector) > pure python(30times) 
      • shedskin(python 2 c++ compile) > pure_python(30times)
    • PyCUDA(real hardware multi cores(DSP)) > numpy(CPU 50times)  


  • using cprofile and dis to trace back where is the performance bottleneck 
    • python -cprofile  
      • @profie decorader in each proc call 
    • python dis.dis(function call) 
      • trace byte code linse, macro blocks, where to improve 
    • serialize code
      • data store in cache > store in memory   
    • RunSnakeRun
      • GUI tool(cprofile results)
  • Multiprocessing: parallel process to multi cores(CPUS)
  • pypy: JIT(just in time) compiler, llvm, byte code optimization, rpython(.Net), cython 
  • numpy: N dimension array vector to one dimension vector array(serialize memory access)
  • cpython: pre-compile python to having object type(int,char...) 
  • PyCUDA: hardware speed up (DSPs)
tips
  • a = {} >  a = dict()
  • "".join(st) > st = "a" + "b" + "c"
  • [i.upper() for i in test] > for i in test: arr.append(i.upper())
  • def test() > global test ...
  • init dict > without init dict
  • pre load(import modules in header) > current load(import modules in middle)
  • reduce the call back counts 
  • xrange > range
  • remap func without recursive loops
  • hash(heap map) > loop searching  

ref:
EuroPython2011_HighPerformanceComputing
performance tips
timecomplex

沒有留言:

張貼留言