最近在讀 high performance python, 底下就隨手記錄一下吧...XD
- test env
- MacBook 2.0GHz with 4GB RAM. The GPU is a 9400M
- pypy(JIT) > pure python (~6times)
- cython(move math(complex) part to c) > pure python (1.5~30times)
- numpy(matrix vector) > pure python(30times)
- shedskin(python 2 c++ compile) > pure_python(30times)
- PyCUDA(real hardware multi cores(DSP)) > numpy(CPU 50times)
- using cprofile and dis to trace back where is the performance bottleneck
- python -cprofile
- @profie decorader in each proc call
- python dis.dis(function call)
- trace byte code linse, macro blocks, where to improve
- serialize code
- data store in cache > store in memory
- RunSnakeRun
- GUI tool(cprofile results)
- Multiprocessing: parallel process to multi cores(CPUS)
- pypy: JIT(just in time) compiler, llvm, byte code optimization, rpython(.Net), cython
- numpy: N dimension array vector to one dimension vector array(serialize memory access)
- cpython: pre-compile python to having object type(int,char...)
- PyCUDA: hardware speed up (DSPs)
tips
- a = {} > a = dict()
- "".join(st) > st = "a" + "b" + "c"
- [i.upper() for i in test] > for i in test: arr.append(i.upper())
- def test() > global test ...
- init dict > without init dict
- pre load(import modules in header) > current load(import modules in middle)
- reduce the call back counts
- xrange > range
- remap func without recursive loops
- hash(heap map) > loop searching
ref:
EuroPython2011_HighPerformanceComputing
performance tips
timecomplex
沒有留言:
張貼留言