PythonJS using direct SIMD via the Dart backend and running in the Dart VM is about 6X faster than CPython with Numpy in the following micro benchmark testing float32x4 multiplication. SIMD stands for single instruction multiple data, and it allows you to instruct the CPU to perform the same math operation on a vector of data to increase performance. Read more about SIMD on my old research blog, here.
I was expecting NumPy would have specialized the case of an array with four float32 elements to use SIMD. Searching around for why this is the case in NumPy, I could not find any clear answers why: [1], [2]. More confused and curious, I jumped into the PyPy IRC chat room, and Matti Picus gave me the answer: NumPy has no direct support for SIMD, instead it relies on helper libraries like: MKL, BLAS, and lapack.
The DartVM includes SIMD and float32x4 and int32x4 primities as part of the core language, you simply import dart:typed_data. Google Chrome and FireFox are also in the process of supporting SIMD.
SIMD multiply micro benchmark
The PythonJS translator has been updated to pass type information to the Dart backend, and translate numpy.array into a Float32x4 vector if it has been typed as float32x4. See my previous blog post about optional static typing, here.
def main(): start = time() float32x4 a = numpy.array( [1.0001, 1.0002, 1.0003, 1.0004], dtype=numpy.float32 ) float32x4 b = numpy.array( [1.00009, 1.00008, 1.00007, 1.00006], dtype=numpy.float32 ) float32x4 c = numpy.array( [1.00005, 1.00004, 1.00003, 1.00002], dtype=numpy.float32 ) arr = [] for i in range(20000): c *= a*b arr.append( a*b*c ) print(time()-start)
No comments:
Post a Comment