Note
Go to the end to download the full example code
Compares matrix multiplication implementations with timeit¶
numpy has a very fast implementation of
matrix multiplication. There are many ways to be slower.
The following uses timeit
to compare implementations.
Compared implementations:
multiply_matrix
codec_multiply_matrix
codec_multiply_matrix_parallel
codec_multiply_matrix_parallel_transposed
code
Preparation¶
import timeit
import numpy
from td3a_cpp.tutorial.td_mul_cython import (
multiply_matrix, c_multiply_matrix,
c_multiply_matrix_parallel,
c_multiply_matrix_parallel_transposed as cmulparamtr)
va = numpy.random.randn(150, 100).astype(numpy.float64)
vb = numpy.random.randn(100, 100).astype(numpy.float64)
ctx = {
'va': va, 'vb': vb, 'c_multiply_matrix': c_multiply_matrix,
'multiply_matrix': multiply_matrix,
'c_multiply_matrix_parallel': c_multiply_matrix_parallel,
'c_multiply_matrix_parallel_transposed': cmulparamtr}
Measures¶
numpy
res0 = timeit.timeit('va @ vb', number=100, globals=ctx)
print("numpy time", res0)
numpy time 0.029423824977129698
python implementation
res1 = timeit.timeit(
'multiply_matrix(va, vb)', number=10, globals=ctx)
print('python implementation', res1)
python implementation 36.37305003963411
cython implementation
res2 = timeit.timeit(
'c_multiply_matrix(va, vb)', number=100, globals=ctx)
print('cython implementation', res2)
cython implementation 0.73594726389274
cython implementation parallelized
res3 = timeit.timeit(
'c_multiply_matrix_parallel(va, vb)', number=100, globals=ctx)
print('cython implementation parallelized', res3)
cython implementation parallelized 0.10048561217263341
cython implementation parallelized, AVX + transposed
res4 = timeit.timeit(
'c_multiply_matrix_parallel_transposed(va, vb)', number=100, globals=ctx)
print('cython implementation parallelized avx', res4)
cython implementation parallelized avx 0.042188897263258696
Speed up…
print(f"numpy is {res1 / res0:f} faster than pure python.")
print(f"numpy is {res2 / res0:f} faster than cython.")
print(f"numpy is {res3 / res0:f} faster than parallelized cython.")
print(f"numpy is {res4 / res0:f} faster than avx parallelized cython.")
numpy is 1236.176808 faster than pure python.
numpy is 25.011951 faster than cython.
numpy is 3.415110 faster than parallelized cython.
numpy is 1.433835 faster than avx parallelized cython.
Total running time of the script: ( 0 minutes 37.302 seconds)