learning plus: Python with HighPerformance

2013年2月7日星期四

Python with HighPerformance

最近在讀 high performance python, 底下就隨手記錄一下吧...XD

test env

MacBook 2.0GHz with 4GB RAM. The GPU is a 9400M

pypy(JIT) > pure python (~6times)
cython(move math(complex) part to c) > pure python (1.5~30times)
numpy(matrix vector) > pure python(30times)
shedskin(python 2 c++ compile) > pure_python(30times)

PyCUDA(real hardware multi cores(DSP)) > numpy(CPU 50times)

using cprofile and dis to trace back where is the performance bottleneck

python -cprofile

@profie decorader in each proc call

python dis.dis(function call)

trace byte code linse, macro blocks, where to improve

serialize code

data store in cache > store in memory

RunSnakeRun

GUI tool(cprofile results)

Multiprocessing: parallel process to multi cores(CPUS)
pypy: JIT(just in time) compiler, llvm, byte code optimization, rpython(.Net), cython
numpy: N dimension array vector to one dimension vector array(serialize memory access)
cpython: pre-compile python to having object type(int,char...)
PyCUDA: hardware speed up (DSPs)

tips

a = {} > a = dict()
"".join(st) > st = "a" + "b" + "c"
[i.upper() for i in test] > for i in test: arr.append(i.upper())
def test() > global test ...
init dict > without init dict
pre load(import modules in header) > current load(import modules in middle)
reduce the call back counts
xrange > range
remap func without recursive loops
hash(heap map) > loop searching

ref:
EuroPython2011_HighPerformanceComputing
performance tips
timecomplex

沒有留言:

張貼留言

訂閱：張貼留言 (Atom)