Profiling and Optimising#
Optimising is the process of making your code ‘better’ by some metric. Most often, this is total runtime (how long it takes), but can also mean things like reducing memory usage or any other way to measure the ‘cost’ of your code.
Profiling is a way of measuring your code, to figure out what part of it you may want to to optimize.
IPython provides some tools that can make it it a bit easier to profile and optimise your code sometimes.
For more information on IPython, see our IPython notes
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
%timeit
#
The main IPython tool we are going to use here is %timeit
,
a magic that automates measuring how long it takes to run a snippet of code.
for N in (100, 500, 1000, 2000):
print(f"Size: {N} x {N}")
A = np.random.random((N, N))
%timeit A.dot(A)
Size: 100 x 100
The slowest run took 28.70 times longer than the fastest. This could mean that an intermediate result is being cached.
2.08 ms ± 2.94 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Size: 500 x 500
2.42 ms ± 182 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Size: 1000 x 1000
19.5 ms ± 6.36 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Size: 2000 x 2000
96.8 ms ± 19.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Let’s look at what options %timeit
can take.
%timeit?
Docstring:
Time execution of a Python statement or expression
Usage, in line mode:
%timeit [-n<N> -r<R> [-t|-c] -q -p<P> -o] statement
or in cell mode:
%%timeit [-n<N> -r<R> [-t|-c] -q -p<P> -o] setup_code
code
code...
Time execution of a Python statement or expression using the timeit
module. This function can be used both as a line and cell magic:
- In line mode you can time a single-line statement (though multiple
ones can be chained with using semicolons).
- In cell mode, the statement in the first line is used as setup code
(executed but not timed) and the body of the cell is timed. The cell
body has access to any variables created in the setup code.
Options:
-n<N>: execute the given statement <N> times in a loop. If <N> is not
provided, <N> is determined so as to get sufficient accuracy.
-r<R>: number of repeats <R>, each consisting of <N> loops, and take the
best result.
Default: 7
-t: use time.time to measure the time, which is the default on Unix.
This function measures wall time.
-c: use time.clock to measure the time, which is the default on
Windows and measures wall time. On Unix, resource.getrusage is used
instead and returns the CPU user time.
-p<P>: use a precision of <P> digits to display the timing result.
Default: 3
-q: Quiet, do not print result.
-o: return a TimeitResult that can be stored in a variable to inspect
the result in more details.
.. versionchanged:: 7.3
User variables are no longer expanded,
the magic line is always left unmodified.
Examples
--------
::
In [1]: %timeit pass
8.26 ns ± 0.12 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)
In [2]: u = None
In [3]: %timeit u is None
29.9 ns ± 0.643 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [4]: %timeit -r 4 u == None
In [5]: import time
In [6]: %timeit -n1 time.sleep(2)
The times reported by %timeit will be slightly higher than those
reported by the timeit.py script when variables are accessed. This is
due to the fact that %timeit executes the statement in the namespace
of the shell, compared with timeit.py, which uses a single setup
statement to import function or create variables. Generally, the bias
does not matter as long as results from timeit.py are not mixed with
those from %timeit.
File: ~/conda/lib/python3.11/site-packages/IPython/core/magics/execution.py
We can save the result in an object with %timeit -o
,
and specify to only run one group of 100 iterations.
A = np.random.random((100, 100))
tr = %timeit -o -n 1 -r 100 A.dot(A)
The slowest run took 101.85 times longer than the fastest. This could mean that an intermediate result is being cached.
1.49 ms ± 2.4 ms per loop (mean ± std. dev. of 100 runs, 1 loop each)
tr
<TimeitResult : 1.49 ms ± 2.4 ms per loop (mean ± std. dev. of 100 runs, 1 loop each)>
tr.best
0.00012358397361822426
tr.best, tr.worst
(0.00012358397361822426, 0.012586958007887006)
print(len(tr.all_runs))
tr.all_runs[:10]
100
[0.0032582079875282943,
0.0023532090126536787,
0.00310250002075918,
0.00602666600025259,
0.003167124988976866,
0.0003759169776458293,
0.012586958007887006,
0.008417666977038607,
0.007104332995368168,
0.0002472500200383365]
plt.hist(np.array(tr.all_runs) * 1e6)
plt.xlabel("t (µs)");
Diffusing a wave#
Our task is to optimise a 1-D diffusion algorithm, using numpy and Cython.
Our input signal is a sawtooth wave:
from scipy.signal import sawtooth
T = 8 * np.pi
t = np.linspace(0, T, 1024)
x = sawtooth(t)
plt.plot(t, x)
steps = 10_000
Exercise: turning math into code#
Hints:
when you see a sum over a variable, that should make you think of a
for
loopx(t)
can be an array, wheret
is an array of values, andx
is the result of calling an
For example, to implement
x_t = np.sin(t)
plt.plot(t, x_t)
[<matplotlib.lines.Line2D at 0x147119050>]
def my_sawtooth(t, A=1, n_freqs=100):
# x = A/2 - ...
x = np.ones(t.shape) * 0.5
# A/π * sum(k=1, k=n_freqs) { sin(k t) / k }
for k in range(1, n_freqs + 1):
x -= np.sin(k * t) / (k * np.pi)
return x * A
# test your sawtooth output
# it should look like similar to the above
n_list = [5, 10, 100, 1000]
fig, axes = plt.subplots(len(n_list), 1)
for ax, n_freqs in zip(axes, n_list):
ax.plot(t, my_sawtooth(t, n_freqs=n_freqs))
ax.set_title(f"n={n_freqs}")
We are going to diffuse the wave by evolving the heat equation:
Which we can discretize for our arrays:
points = [10, 100, 10, 100, 10]
plt.bar(range(len(points)), points)
plt.text(1, 101, "1/2", size=16, ha="center")
plt.text(0, 11, "1/4", size=16, ha="center")
plt.text(2, 11, "1/4", size=16, ha="center")
Text(2, 11, '1/4')
import time
from IPython.display import clear_output
i_list = range(len(points))
for k in range(12):
plt.bar(range(len(points)), points)
plt.ylim(0, 100)
plt.title(f"k = {k}")
plt.show()
next_points = points.copy()
for i in range(1, len(points) - 1):
next_points[i] = 0.25 * points[i - 1] + 0.5 * points[i] + 0.25 * points[i + 1]
points = next_points
time.sleep(1)
clear_output(wait=True)
Pure Python#
We’ll start with a pure Python implementation, to use as a reference.
Let’s implement our algorithm:
Advancing our blur one step looks like:
and we keep repeating that to get
What variables do we need?
How many loops?
def blur_py(x0, steps=1024):
"""Advance a Gaussian blur `steps` steps"""
x = 1 * x0 # copy
x_k = np.empty_like(x)
x_k[0] = x[0]
x_k[-1] = x[-1]
for k in range(steps):
for i in range(1, len(x) - 1):
x_k[i] = 0.25 * (x[i - 1] + 2 * x[i] + x[i + 1])
x, x_k = x_k, x # swap for next step
return x
y = blur_py(x, steps)
plt.plot(t, x, "--")
plt.plot(t, y);
Now we can measure how long it takes to run evolve this system:
ref_run = %timeit -o y = blur_py(x, steps)
t_ref = ref_run.best
times = {"python": t_ref}
2.98 s ± 22.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
So it takes a few seconds. We can also see how it changes with different times and resolutions.
Vectorizing with numpy#
We can vectorize the inner loop with a single numpy operation:
import numpy as np
def blur_np(x, steps=1024):
x = 1 * x
x_k = np.empty_like(x)
x_k[0] = x[0]
x_k[-1] = x[-1]
for _ in range(steps):
x_k[1:-1] = 0.25 * (x[0:-2] + 2 * x[1:-1] + x[2:])
x, x_k = x_k, x
return x
y = blur_np(x, steps)
plt.plot(t, x, "--")
plt.plot(t, y)
[<matplotlib.lines.Line2D at 0x1473ee1d0>]
np_r = %timeit -o blur_np(x, steps)
t_np = np_r.best
33.5 ms ± 196 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
times["numpy"] = t_np
def plot_times():
ind = np.arange(len(times))
plt.bar(ind, times.values(), log=True)
plt.xticks(ind, times.keys(), rotation=30)
plt.ylim(0.1 * min(times.values()), max(times.values()))
plot_times()
So vectorizing the inner loop brings us from O(1 second) to O(10 milliseconds), an improvement of ~100x:
t_ref / t_np
89.18848313821238
Cython#
Cython provides an IPython extension, which defines a magic we can use to inline bits of Cython code in the notebook:
%load_ext Cython
# small fix to improve embedded Cython annotations in our lecture notes
%load_ext _cython_annotate_fix
%%cython
def csum(n):
cs = 0
for i in range(n):
cs += i
return cs
%timeit csum(5)
147 ns ± 0.831 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
%%cython?
Docstring:
::
%cython [-a] [--annotate-fullc] [-+] [-3] [-2] [-f] [-c COMPILE_ARGS]
[--link-args LINK_ARGS] [-l LIB] [-n NAME] [-L dir] [-I INCLUDE]
[-S SRC] [--pgo] [--verbose]
Compile and import everything from a Cython code cell.
The contents of the cell are written to a `.pyx` file in the
directory `IPYTHONDIR/cython` using a filename with the hash of the
code. This file is then cythonized and compiled. The resulting module
is imported and all of its symbols are injected into the user's
namespace. The usage is similar to that of `%%cython_pyximport` but
you don't have to pass a module name::
%%cython
def f(x):
return 2.0*x
To compile OpenMP codes, pass the required `--compile-args`
and `--link-args`. For example with gcc::
%%cython --compile-args=-fopenmp --link-args=-fopenmp
...
To enable profile guided optimisation, pass the ``--pgo`` option.
Note that the cell itself needs to take care of establishing a suitable
profile when executed. This can be done by implementing the functions to
optimise, and then calling them directly in the same cell on some realistic
training data like this::
%%cython --pgo
def critical_function(data):
for item in data:
...
# execute function several times to build profile
from somewhere import some_typical_data
for _ in range(100):
critical_function(some_typical_data)
In Python 3.5 and later, you can distinguish between the profile and
non-profile runs as follows::
if "_pgo_" in __name__:
... # execute critical code here
options:
-a, --annotate Produce a colorized HTML version of the source.
--annotate-fullc Produce a colorized HTML version of the source which
includes entire generated C/C++-code.
-+, --cplus Output a C++ rather than C file.
-3 Select Python 3 syntax.
-2 Select Python 2 syntax.
-f, --force Force the compilation of a new module, even if the
source has been previously compiled.
-c COMPILE_ARGS, --compile-args COMPILE_ARGS
Extra flags to pass to compiler via the
`extra_compile_args` Extension flag (can be specified
multiple times).
--link-args LINK_ARGS
Extra flags to pass to linker via the
`extra_link_args` Extension flag (can be specified
multiple times).
-l LIB, --lib LIB Add a library to link the extension against (can be
specified multiple times).
-n NAME, --name NAME Specify a name for the Cython module.
-L dir Add a path to the list of library directories (can be
specified multiple times).
-I INCLUDE, --include INCLUDE
Add a path to the list of include directories (can be
specified multiple times).
-S SRC, --src SRC Add a path to the list of src files (can be specified
multiple times).
--pgo Enable profile guided optimisation in the C compiler.
Compiles the cell twice and executes it in between to
generate a runtime profile.
--verbose Print debug information like generated .c/.cpp file
location and exact gcc/g++ command invoked.
File: ~/conda/lib/python3.11/site-packages/Cython/Build/IpythonMagic.py
%%cython --annotate
shows you annotations about the generated sourcecode.
The key to writing Cython is to minimize the amount of Python calls in the generated code. In general: yellow = slow.
def psum(n):
cs = 0
for i in range(n):
cs += i
return cs
%%cython --annotate
def csum(n):
cs = 0
for i in range(n):
cs += i
return cs
Generated by Cython 3.0.2
Yellow lines hint at Python interaction.
Click on a line that starts with a "+
" to see the C code that Cython generated for it.
1:
+2: def csum(n):
/* Python wrapper */ static PyObject *__pyx_pw_54_cython_magic_ffa75fecb2c2b5d887239288c8b3c0369a07c134_1csum(PyObject *__pyx_self, #if CYTHON_METH_FASTCALL PyObject *const *__pyx_args, Py_ssize_t __pyx_nargs, PyObject *__pyx_kwds #else PyObject *__pyx_args, PyObject *__pyx_kwds #endif ); /*proto*/ static PyMethodDef __pyx_mdef_54_cython_magic_ffa75fecb2c2b5d887239288c8b3c0369a07c134_1csum = {"csum", (PyCFunction)(void*)(__Pyx_PyCFunction_FastCallWithKeywords)__pyx_pw_54_cython_magic_ffa75fecb2c2b5d887239288c8b3c0369a07c134_1csum, __Pyx_METH_FASTCALL|METH_KEYWORDS, 0}; static PyObject *__pyx_pw_54_cython_magic_ffa75fecb2c2b5d887239288c8b3c0369a07c134_1csum(PyObject *__pyx_self, #if CYTHON_METH_FASTCALL PyObject *const *__pyx_args, Py_ssize_t __pyx_nargs, PyObject *__pyx_kwds #else PyObject *__pyx_args, PyObject *__pyx_kwds #endif ) { PyObject *__pyx_v_n = 0; #if !CYTHON_METH_FASTCALL CYTHON_UNUSED Py_ssize_t __pyx_nargs; #endif CYTHON_UNUSED PyObject *const *__pyx_kwvalues; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("csum (wrapper)", 0); #if !CYTHON_METH_FASTCALL #if CYTHON_ASSUME_SAFE_MACROS __pyx_nargs = PyTuple_GET_SIZE(__pyx_args); #else __pyx_nargs = PyTuple_Size(__pyx_args); if (unlikely((__pyx_nargs < 0))) __PYX_ERR(0, 2, __pyx_L3_error) #endif #endif __pyx_kwvalues = __Pyx_KwValues_FASTCALL(__pyx_args, __pyx_nargs); { PyObject **__pyx_pyargnames[] = {&__pyx_n_s_n,0}; PyObject* values[1] = {0}; if (__pyx_kwds) { Py_ssize_t kw_args; switch (__pyx_nargs) { case 1: values[0] = __Pyx_Arg_FASTCALL(__pyx_args, 0); CYTHON_FALLTHROUGH; case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = __Pyx_NumKwargs_FASTCALL(__pyx_kwds); switch (__pyx_nargs) { case 0: if (likely((values[0] = __Pyx_GetKwValue_FASTCALL(__pyx_kwds, __pyx_kwvalues, __pyx_n_s_n)) != 0)) { (void)__Pyx_Arg_NewRef_FASTCALL(values[0]); kw_args--; } else if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 2, __pyx_L3_error) else goto __pyx_L5_argtuple_error; } if (unlikely(kw_args > 0)) { const Py_ssize_t kwd_pos_args = __pyx_nargs; if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_kwvalues, __pyx_pyargnames, 0, values + 0, kwd_pos_args, "csum") < 0)) __PYX_ERR(0, 2, __pyx_L3_error) } } else if (unlikely(__pyx_nargs != 1)) { goto __pyx_L5_argtuple_error; } else { values[0] = __Pyx_Arg_FASTCALL(__pyx_args, 0); } __pyx_v_n = values[0]; } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("csum", 1, 1, 1, __pyx_nargs); __PYX_ERR(0, 2, __pyx_L3_error) goto __pyx_L3_error; __pyx_L3_error:; { Py_ssize_t __pyx_temp; for (__pyx_temp=0; __pyx_temp < (Py_ssize_t)(sizeof(values)/sizeof(values[0])); ++__pyx_temp) { __Pyx_Arg_XDECREF_FASTCALL(values[__pyx_temp]); } } __Pyx_AddTraceback("_cython_magic_ffa75fecb2c2b5d887239288c8b3c0369a07c134.csum", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; __pyx_r = __pyx_pf_54_cython_magic_ffa75fecb2c2b5d887239288c8b3c0369a07c134_csum(__pyx_self, __pyx_v_n); int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; /* function exit code */ { Py_ssize_t __pyx_temp; for (__pyx_temp=0; __pyx_temp < (Py_ssize_t)(sizeof(values)/sizeof(values[0])); ++__pyx_temp) { __Pyx_Arg_XDECREF_FASTCALL(values[__pyx_temp]); } } __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyObject *__pyx_pf_54_cython_magic_ffa75fecb2c2b5d887239288c8b3c0369a07c134_csum(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_n) { PyObject *__pyx_v_cs = NULL; PyObject *__pyx_v_i = NULL; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("csum", 0); /* … */ /* function exit code */ __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_XDECREF(__pyx_t_2); __Pyx_AddTraceback("_cython_magic_ffa75fecb2c2b5d887239288c8b3c0369a07c134.csum", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XDECREF(__pyx_v_cs); __Pyx_XDECREF(__pyx_v_i); __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* … */ __pyx_tuple_ = PyTuple_Pack(3, __pyx_n_s_n, __pyx_n_s_cs, __pyx_n_s_i); if (unlikely(!__pyx_tuple_)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_tuple_); __Pyx_GIVEREF(__pyx_tuple_); /* … */ __pyx_t_2 = __Pyx_CyFunction_New(&__pyx_mdef_54_cython_magic_ffa75fecb2c2b5d887239288c8b3c0369a07c134_1csum, 0, __pyx_n_s_csum, NULL, __pyx_n_s_cython_magic_ffa75fecb2c2b5d887, __pyx_d, ((PyObject *)__pyx_codeobj__2)); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); if (PyDict_SetItem(__pyx_d, __pyx_n_s_csum, __pyx_t_2) < 0) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; __pyx_t_2 = __Pyx_PyDict_NewPresized(0); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_2) < 0) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
+3: cs = 0
__Pyx_INCREF(__pyx_int_0);
__pyx_v_cs = __pyx_int_0;
+4: for i in range(n):
__pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_builtin_range, __pyx_v_n); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 4, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (likely(PyList_CheckExact(__pyx_t_1)) || PyTuple_CheckExact(__pyx_t_1)) { __pyx_t_2 = __pyx_t_1; __Pyx_INCREF(__pyx_t_2); __pyx_t_3 = 0; __pyx_t_4 = NULL; } else { __pyx_t_3 = -1; __pyx_t_2 = PyObject_GetIter(__pyx_t_1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 4, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); __pyx_t_4 = __Pyx_PyObject_GetIterNextFunc(__pyx_t_2); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 4, __pyx_L1_error) } __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; for (;;) { if (likely(!__pyx_t_4)) { if (likely(PyList_CheckExact(__pyx_t_2))) { if (__pyx_t_3 >= PyList_GET_SIZE(__pyx_t_2)) break; #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS __pyx_t_1 = PyList_GET_ITEM(__pyx_t_2, __pyx_t_3); __Pyx_INCREF(__pyx_t_1); __pyx_t_3++; if (unlikely((0 < 0))) __PYX_ERR(0, 4, __pyx_L1_error) #else __pyx_t_1 = PySequence_ITEM(__pyx_t_2, __pyx_t_3); __pyx_t_3++; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 4, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); #endif } else { if (__pyx_t_3 >= PyTuple_GET_SIZE(__pyx_t_2)) break; #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS __pyx_t_1 = PyTuple_GET_ITEM(__pyx_t_2, __pyx_t_3); __Pyx_INCREF(__pyx_t_1); __pyx_t_3++; if (unlikely((0 < 0))) __PYX_ERR(0, 4, __pyx_L1_error) #else __pyx_t_1 = PySequence_ITEM(__pyx_t_2, __pyx_t_3); __pyx_t_3++; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 4, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); #endif } } else { __pyx_t_1 = __pyx_t_4(__pyx_t_2); if (unlikely(!__pyx_t_1)) { PyObject* exc_type = PyErr_Occurred(); if (exc_type) { if (likely(__Pyx_PyErr_GivenExceptionMatches(exc_type, PyExc_StopIteration))) PyErr_Clear(); else __PYX_ERR(0, 4, __pyx_L1_error) } break; } __Pyx_GOTREF(__pyx_t_1); } __Pyx_XDECREF_SET(__pyx_v_i, __pyx_t_1); __pyx_t_1 = 0; /* … */ } __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
+5: cs += i
__pyx_t_1 = PyNumber_InPlaceAdd(__pyx_v_cs, __pyx_v_i); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 5, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); __Pyx_DECREF_SET(__pyx_v_cs, __pyx_t_1); __pyx_t_1 = 0;
+6: return cs
__Pyx_XDECREF(__pyx_r); __Pyx_INCREF(__pyx_v_cs); __pyx_r = __pyx_v_cs; goto __pyx_L0;
Uh oh, that looks like a lot of yellow. We can reduce it by adding some type annotations:
%%cython --annotate
from cython import int
def csum2(n: int):
i: int
cs = 0
for i in range(n):
cs += i
return cs
Generated by Cython 3.0.2
Yellow lines hint at Python interaction.
Click on a line that starts with a "+
" to see the C code that Cython generated for it.
1:
+2: from cython import int
__pyx_t_3 = __Pyx_PyDict_NewPresized(0); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_3) < 0) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
3:
+4: def csum2(n: int):
/* Python wrapper */ static PyObject *__pyx_pw_54_cython_magic_3b29ad3d8007e919ac9149407675ad03f92e5d16_1csum2(PyObject *__pyx_self, #if CYTHON_METH_FASTCALL PyObject *const *__pyx_args, Py_ssize_t __pyx_nargs, PyObject *__pyx_kwds #else PyObject *__pyx_args, PyObject *__pyx_kwds #endif ); /*proto*/ static PyMethodDef __pyx_mdef_54_cython_magic_3b29ad3d8007e919ac9149407675ad03f92e5d16_1csum2 = {"csum2", (PyCFunction)(void*)(__Pyx_PyCFunction_FastCallWithKeywords)__pyx_pw_54_cython_magic_3b29ad3d8007e919ac9149407675ad03f92e5d16_1csum2, __Pyx_METH_FASTCALL|METH_KEYWORDS, 0}; static PyObject *__pyx_pw_54_cython_magic_3b29ad3d8007e919ac9149407675ad03f92e5d16_1csum2(PyObject *__pyx_self, #if CYTHON_METH_FASTCALL PyObject *const *__pyx_args, Py_ssize_t __pyx_nargs, PyObject *__pyx_kwds #else PyObject *__pyx_args, PyObject *__pyx_kwds #endif ) { int __pyx_v_n; #if !CYTHON_METH_FASTCALL CYTHON_UNUSED Py_ssize_t __pyx_nargs; #endif CYTHON_UNUSED PyObject *const *__pyx_kwvalues; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("csum2 (wrapper)", 0); #if !CYTHON_METH_FASTCALL #if CYTHON_ASSUME_SAFE_MACROS __pyx_nargs = PyTuple_GET_SIZE(__pyx_args); #else __pyx_nargs = PyTuple_Size(__pyx_args); if (unlikely((__pyx_nargs < 0))) __PYX_ERR(0, 4, __pyx_L3_error) #endif #endif __pyx_kwvalues = __Pyx_KwValues_FASTCALL(__pyx_args, __pyx_nargs); { PyObject **__pyx_pyargnames[] = {&__pyx_n_s_n,0}; PyObject* values[1] = {0}; if (__pyx_kwds) { Py_ssize_t kw_args; switch (__pyx_nargs) { case 1: values[0] = __Pyx_Arg_FASTCALL(__pyx_args, 0); CYTHON_FALLTHROUGH; case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = __Pyx_NumKwargs_FASTCALL(__pyx_kwds); switch (__pyx_nargs) { case 0: if (likely((values[0] = __Pyx_GetKwValue_FASTCALL(__pyx_kwds, __pyx_kwvalues, __pyx_n_s_n)) != 0)) { (void)__Pyx_Arg_NewRef_FASTCALL(values[0]); kw_args--; } else if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 4, __pyx_L3_error) else goto __pyx_L5_argtuple_error; } if (unlikely(kw_args > 0)) { const Py_ssize_t kwd_pos_args = __pyx_nargs; if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_kwvalues, __pyx_pyargnames, 0, values + 0, kwd_pos_args, "csum2") < 0)) __PYX_ERR(0, 4, __pyx_L3_error) } } else if (unlikely(__pyx_nargs != 1)) { goto __pyx_L5_argtuple_error; } else { values[0] = __Pyx_Arg_FASTCALL(__pyx_args, 0); } __pyx_v_n = __Pyx_PyInt_As_int(values[0]); if (unlikely((__pyx_v_n == (int)-1) && PyErr_Occurred())) __PYX_ERR(0, 4, __pyx_L3_error) } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("csum2", 1, 1, 1, __pyx_nargs); __PYX_ERR(0, 4, __pyx_L3_error) goto __pyx_L3_error; __pyx_L3_error:; { Py_ssize_t __pyx_temp; for (__pyx_temp=0; __pyx_temp < (Py_ssize_t)(sizeof(values)/sizeof(values[0])); ++__pyx_temp) { __Pyx_Arg_XDECREF_FASTCALL(values[__pyx_temp]); } } __Pyx_AddTraceback("_cython_magic_3b29ad3d8007e919ac9149407675ad03f92e5d16.csum2", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; __pyx_r = __pyx_pf_54_cython_magic_3b29ad3d8007e919ac9149407675ad03f92e5d16_csum2(__pyx_self, __pyx_v_n); int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; /* function exit code */ { Py_ssize_t __pyx_temp; for (__pyx_temp=0; __pyx_temp < (Py_ssize_t)(sizeof(values)/sizeof(values[0])); ++__pyx_temp) { __Pyx_Arg_XDECREF_FASTCALL(values[__pyx_temp]); } } __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyObject *__pyx_pf_54_cython_magic_3b29ad3d8007e919ac9149407675ad03f92e5d16_csum2(CYTHON_UNUSED PyObject *__pyx_self, int __pyx_v_n) { int __pyx_v_i; PyObject *__pyx_v_cs = NULL; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("csum2", 0); /* … */ /* function exit code */ __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_5); __Pyx_AddTraceback("_cython_magic_3b29ad3d8007e919ac9149407675ad03f92e5d16.csum2", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XDECREF(__pyx_v_cs); __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* … */ __pyx_tuple_ = PyTuple_Pack(3, __pyx_n_s_n, __pyx_n_s_i, __pyx_n_s_cs); if (unlikely(!__pyx_tuple_)) __PYX_ERR(0, 4, __pyx_L1_error) __Pyx_GOTREF(__pyx_tuple_); __Pyx_GIVEREF(__pyx_tuple_); /* … */ __pyx_t_2 = __Pyx_PyDict_NewPresized(1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 4, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); if (PyDict_SetItem(__pyx_t_2, __pyx_n_s_n, __pyx_n_s_int) < 0) __PYX_ERR(0, 4, __pyx_L1_error) __pyx_t_3 = __Pyx_CyFunction_New(&__pyx_mdef_54_cython_magic_3b29ad3d8007e919ac9149407675ad03f92e5d16_1csum2, 0, __pyx_n_s_csum2, NULL, __pyx_n_s_cython_magic_3b29ad3d8007e919ac, __pyx_d, ((PyObject *)__pyx_codeobj__2)); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 4, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __Pyx_CyFunction_SetAnnotationsDict(__pyx_t_3, __pyx_t_2); __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; if (PyDict_SetItem(__pyx_d, __pyx_n_s_csum2, __pyx_t_3) < 0) __PYX_ERR(0, 4, __pyx_L1_error) __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
5: i: int
+6: cs = 0
__Pyx_INCREF(__pyx_int_0);
__pyx_v_cs = __pyx_int_0;
+7: for i in range(n):
__pyx_t_1 = __pyx_v_n; __pyx_t_2 = __pyx_t_1; for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) { __pyx_v_i = __pyx_t_3;
+8: cs += i
__pyx_t_4 = __Pyx_PyInt_From_int(__pyx_v_i); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 8, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __pyx_t_5 = PyNumber_InPlaceAdd(__pyx_v_cs, __pyx_t_4); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 8, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __Pyx_DECREF_SET(__pyx_v_cs, __pyx_t_5); __pyx_t_5 = 0; }
+9: return cs
__Pyx_XDECREF(__pyx_r); __Pyx_INCREF(__pyx_v_cs); __pyx_r = __pyx_v_cs; goto __pyx_L0;
Almost there, but I still see yellow on the lines with cs
:
%%cython --annotate
from cython import int
def csum3(n: int) -> int:
i: int
cs: int = 0
for i in range(n):
cs += i
return cs
Generated by Cython 3.0.2
Yellow lines hint at Python interaction.
Click on a line that starts with a "+
" to see the C code that Cython generated for it.
+1: from cython import int
__pyx_t_3 = __Pyx_PyDict_NewPresized(0); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 1, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_3) < 0) __PYX_ERR(0, 1, __pyx_L1_error) __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
2:
+3: def csum3(n: int) -> int:
/* Python wrapper */ static PyObject *__pyx_pw_54_cython_magic_05d58375b0f56a22bc6d33d2931bd7a4ec562543_1csum3(PyObject *__pyx_self, #if CYTHON_METH_FASTCALL PyObject *const *__pyx_args, Py_ssize_t __pyx_nargs, PyObject *__pyx_kwds #else PyObject *__pyx_args, PyObject *__pyx_kwds #endif ); /*proto*/ static PyMethodDef __pyx_mdef_54_cython_magic_05d58375b0f56a22bc6d33d2931bd7a4ec562543_1csum3 = {"csum3", (PyCFunction)(void*)(__Pyx_PyCFunction_FastCallWithKeywords)__pyx_pw_54_cython_magic_05d58375b0f56a22bc6d33d2931bd7a4ec562543_1csum3, __Pyx_METH_FASTCALL|METH_KEYWORDS, 0}; static PyObject *__pyx_pw_54_cython_magic_05d58375b0f56a22bc6d33d2931bd7a4ec562543_1csum3(PyObject *__pyx_self, #if CYTHON_METH_FASTCALL PyObject *const *__pyx_args, Py_ssize_t __pyx_nargs, PyObject *__pyx_kwds #else PyObject *__pyx_args, PyObject *__pyx_kwds #endif ) { int __pyx_v_n; #if !CYTHON_METH_FASTCALL CYTHON_UNUSED Py_ssize_t __pyx_nargs; #endif CYTHON_UNUSED PyObject *const *__pyx_kwvalues; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("csum3 (wrapper)", 0); #if !CYTHON_METH_FASTCALL #if CYTHON_ASSUME_SAFE_MACROS __pyx_nargs = PyTuple_GET_SIZE(__pyx_args); #else __pyx_nargs = PyTuple_Size(__pyx_args); if (unlikely((__pyx_nargs < 0))) __PYX_ERR(0, 3, __pyx_L3_error) #endif #endif __pyx_kwvalues = __Pyx_KwValues_FASTCALL(__pyx_args, __pyx_nargs); { PyObject **__pyx_pyargnames[] = {&__pyx_n_s_n,0}; PyObject* values[1] = {0}; if (__pyx_kwds) { Py_ssize_t kw_args; switch (__pyx_nargs) { case 1: values[0] = __Pyx_Arg_FASTCALL(__pyx_args, 0); CYTHON_FALLTHROUGH; case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = __Pyx_NumKwargs_FASTCALL(__pyx_kwds); switch (__pyx_nargs) { case 0: if (likely((values[0] = __Pyx_GetKwValue_FASTCALL(__pyx_kwds, __pyx_kwvalues, __pyx_n_s_n)) != 0)) { (void)__Pyx_Arg_NewRef_FASTCALL(values[0]); kw_args--; } else if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 3, __pyx_L3_error) else goto __pyx_L5_argtuple_error; } if (unlikely(kw_args > 0)) { const Py_ssize_t kwd_pos_args = __pyx_nargs; if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_kwvalues, __pyx_pyargnames, 0, values + 0, kwd_pos_args, "csum3") < 0)) __PYX_ERR(0, 3, __pyx_L3_error) } } else if (unlikely(__pyx_nargs != 1)) { goto __pyx_L5_argtuple_error; } else { values[0] = __Pyx_Arg_FASTCALL(__pyx_args, 0); } __pyx_v_n = __Pyx_PyInt_As_int(values[0]); if (unlikely((__pyx_v_n == (int)-1) && PyErr_Occurred())) __PYX_ERR(0, 3, __pyx_L3_error) } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("csum3", 1, 1, 1, __pyx_nargs); __PYX_ERR(0, 3, __pyx_L3_error) goto __pyx_L3_error; __pyx_L3_error:; { Py_ssize_t __pyx_temp; for (__pyx_temp=0; __pyx_temp < (Py_ssize_t)(sizeof(values)/sizeof(values[0])); ++__pyx_temp) { __Pyx_Arg_XDECREF_FASTCALL(values[__pyx_temp]); } } __Pyx_AddTraceback("_cython_magic_05d58375b0f56a22bc6d33d2931bd7a4ec562543.csum3", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; __pyx_r = __pyx_pf_54_cython_magic_05d58375b0f56a22bc6d33d2931bd7a4ec562543_csum3(__pyx_self, __pyx_v_n); int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; /* function exit code */ { Py_ssize_t __pyx_temp; for (__pyx_temp=0; __pyx_temp < (Py_ssize_t)(sizeof(values)/sizeof(values[0])); ++__pyx_temp) { __Pyx_Arg_XDECREF_FASTCALL(values[__pyx_temp]); } } __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyObject *__pyx_pf_54_cython_magic_05d58375b0f56a22bc6d33d2931bd7a4ec562543_csum3(CYTHON_UNUSED PyObject *__pyx_self, int __pyx_v_n) { int __pyx_v_i; int __pyx_v_cs; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("csum3", 0); /* … */ /* function exit code */ __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_4); __Pyx_AddTraceback("_cython_magic_05d58375b0f56a22bc6d33d2931bd7a4ec562543.csum3", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* … */ __pyx_tuple_ = PyTuple_Pack(3, __pyx_n_s_n, __pyx_n_s_i, __pyx_n_s_cs); if (unlikely(!__pyx_tuple_)) __PYX_ERR(0, 3, __pyx_L1_error) __Pyx_GOTREF(__pyx_tuple_); __Pyx_GIVEREF(__pyx_tuple_); /* … */ __pyx_t_2 = __Pyx_PyDict_NewPresized(2); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 3, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); if (PyDict_SetItem(__pyx_t_2, __pyx_n_s_n, __pyx_n_s_int) < 0) __PYX_ERR(0, 3, __pyx_L1_error) if (PyDict_SetItem(__pyx_t_2, __pyx_n_s_return, __pyx_n_s_int) < 0) __PYX_ERR(0, 3, __pyx_L1_error) __pyx_t_3 = __Pyx_CyFunction_New(&__pyx_mdef_54_cython_magic_05d58375b0f56a22bc6d33d2931bd7a4ec562543_1csum3, 0, __pyx_n_s_csum3, NULL, __pyx_n_s_cython_magic_05d58375b0f56a22bc, __pyx_d, ((PyObject *)__pyx_codeobj__2)); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 3, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __Pyx_CyFunction_SetAnnotationsDict(__pyx_t_3, __pyx_t_2); __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; if (PyDict_SetItem(__pyx_d, __pyx_n_s_csum3, __pyx_t_3) < 0) __PYX_ERR(0, 3, __pyx_L1_error) __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
4: i: int
+5: cs: int = 0
__pyx_v_cs = 0;
+6: for i in range(n):
__pyx_t_1 = __pyx_v_n; __pyx_t_2 = __pyx_t_1; for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) { __pyx_v_i = __pyx_t_3;
+7: cs += i
__pyx_v_cs = (__pyx_v_cs + __pyx_v_i); }
+8: return cs
__Pyx_XDECREF(__pyx_r); __pyx_t_4 = __Pyx_PyInt_From_int(__pyx_v_cs); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 8, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __pyx_r = __pyx_t_4; __pyx_t_4 = 0; goto __pyx_L0;
Much better! Now there’s only Python when entering the function, which is about as good as we can do.
N = 1_000_000
print("psum ", end=" ")
tr_psum = %timeit -o psum(N)
print("csum ", end=" ")
tr_csum = %timeit -o csum(N)
print("csum2", end=" ")
tr_csum2 = %timeit -o csum2(N)
print("csum3", end=" ")
tr_csum3 = %timeit -o csum3(N)
psum 23.5 ms ± 245 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
csum 20.8 ms ± 681 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
csum2 20 ms ± 901 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
csum3 24.8 ns ± 0.195 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
from timing_utils import timing_table
timing_table(
{
"psum": tr_psum,
"csum": tr_csum,
"csum2": tr_csum2,
"csum3": tr_csum3,
}
)
implementation | speed |
---|---|
psum | 1.0 (normalized) |
csum | 1.13x |
csum2 | 1.18x |
csum3 | 946,558x |
Blurring with Cython#
Now we can apply the same principles to writing a blur in Cython.
%%cython --annotate
import numpy as np
def blur_cython(x, steps=1024):
x = 1 * x # copy
x_k = np.empty_like(x)
x_k[0] = x[0]
x_k[-1] = x[-1]
for _ in range(steps):
for i in range(1, len(x) - 1):
x_k[i] = 0.25 * (x[i - 1] + 2 * x[i] + x[i + 1])
x, x_k = x_k, x # swap for next step
return x
Generated by Cython 3.0.2
Yellow lines hint at Python interaction.
Click on a line that starts with a "+
" to see the C code that Cython generated for it.
01:
+02: import numpy as np
__pyx_t_2 = __Pyx_ImportDottedModule(__pyx_n_s_numpy, NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_2) < 0) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; /* … */ __pyx_t_2 = __Pyx_PyDict_NewPresized(0); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_2) < 0) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
03:
04:
+05: def blur_cython(x, steps=1024):
/* Python wrapper */ static PyObject *__pyx_pw_54_cython_magic_5e3a7c0b9a77ebd65e99e24b10f8f056789e4a47_1blur_cython(PyObject *__pyx_self, #if CYTHON_METH_FASTCALL PyObject *const *__pyx_args, Py_ssize_t __pyx_nargs, PyObject *__pyx_kwds #else PyObject *__pyx_args, PyObject *__pyx_kwds #endif ); /*proto*/ static PyMethodDef __pyx_mdef_54_cython_magic_5e3a7c0b9a77ebd65e99e24b10f8f056789e4a47_1blur_cython = {"blur_cython", (PyCFunction)(void*)(__Pyx_PyCFunction_FastCallWithKeywords)__pyx_pw_54_cython_magic_5e3a7c0b9a77ebd65e99e24b10f8f056789e4a47_1blur_cython, __Pyx_METH_FASTCALL|METH_KEYWORDS, 0}; static PyObject *__pyx_pw_54_cython_magic_5e3a7c0b9a77ebd65e99e24b10f8f056789e4a47_1blur_cython(PyObject *__pyx_self, #if CYTHON_METH_FASTCALL PyObject *const *__pyx_args, Py_ssize_t __pyx_nargs, PyObject *__pyx_kwds #else PyObject *__pyx_args, PyObject *__pyx_kwds #endif ) { PyObject *__pyx_v_x = 0; PyObject *__pyx_v_steps = 0; #if !CYTHON_METH_FASTCALL CYTHON_UNUSED Py_ssize_t __pyx_nargs; #endif CYTHON_UNUSED PyObject *const *__pyx_kwvalues; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("blur_cython (wrapper)", 0); #if !CYTHON_METH_FASTCALL #if CYTHON_ASSUME_SAFE_MACROS __pyx_nargs = PyTuple_GET_SIZE(__pyx_args); #else __pyx_nargs = PyTuple_Size(__pyx_args); if (unlikely((__pyx_nargs < 0))) __PYX_ERR(0, 5, __pyx_L3_error) #endif #endif __pyx_kwvalues = __Pyx_KwValues_FASTCALL(__pyx_args, __pyx_nargs); { PyObject **__pyx_pyargnames[] = {&__pyx_n_s_x,&__pyx_n_s_steps,0}; PyObject* values[2] = {0,0}; values[1] = __Pyx_Arg_NewRef_FASTCALL(((PyObject *)((PyObject *)__pyx_int_1024))); if (__pyx_kwds) { Py_ssize_t kw_args; switch (__pyx_nargs) { case 2: values[1] = __Pyx_Arg_FASTCALL(__pyx_args, 1); CYTHON_FALLTHROUGH; case 1: values[0] = __Pyx_Arg_FASTCALL(__pyx_args, 0); CYTHON_FALLTHROUGH; case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = __Pyx_NumKwargs_FASTCALL(__pyx_kwds); switch (__pyx_nargs) { case 0: if (likely((values[0] = __Pyx_GetKwValue_FASTCALL(__pyx_kwds, __pyx_kwvalues, __pyx_n_s_x)) != 0)) { (void)__Pyx_Arg_NewRef_FASTCALL(values[0]); kw_args--; } else if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 5, __pyx_L3_error) else goto __pyx_L5_argtuple_error; CYTHON_FALLTHROUGH; case 1: if (kw_args > 0) { PyObject* value = __Pyx_GetKwValue_FASTCALL(__pyx_kwds, __pyx_kwvalues, __pyx_n_s_steps); if (value) { values[1] = __Pyx_Arg_NewRef_FASTCALL(value); kw_args--; } else if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 5, __pyx_L3_error) } } if (unlikely(kw_args > 0)) { const Py_ssize_t kwd_pos_args = __pyx_nargs; if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_kwvalues, __pyx_pyargnames, 0, values + 0, kwd_pos_args, "blur_cython") < 0)) __PYX_ERR(0, 5, __pyx_L3_error) } } else { switch (__pyx_nargs) { case 2: values[1] = __Pyx_Arg_FASTCALL(__pyx_args, 1); CYTHON_FALLTHROUGH; case 1: values[0] = __Pyx_Arg_FASTCALL(__pyx_args, 0); break; default: goto __pyx_L5_argtuple_error; } } __pyx_v_x = values[0]; __pyx_v_steps = values[1]; } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("blur_cython", 0, 1, 2, __pyx_nargs); __PYX_ERR(0, 5, __pyx_L3_error) goto __pyx_L3_error; __pyx_L3_error:; { Py_ssize_t __pyx_temp; for (__pyx_temp=0; __pyx_temp < (Py_ssize_t)(sizeof(values)/sizeof(values[0])); ++__pyx_temp) { __Pyx_Arg_XDECREF_FASTCALL(values[__pyx_temp]); } } __Pyx_AddTraceback("_cython_magic_5e3a7c0b9a77ebd65e99e24b10f8f056789e4a47.blur_cython", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; __pyx_r = __pyx_pf_54_cython_magic_5e3a7c0b9a77ebd65e99e24b10f8f056789e4a47_blur_cython(__pyx_self, __pyx_v_x, __pyx_v_steps); int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; /* function exit code */ { Py_ssize_t __pyx_temp; for (__pyx_temp=0; __pyx_temp < (Py_ssize_t)(sizeof(values)/sizeof(values[0])); ++__pyx_temp) { __Pyx_Arg_XDECREF_FASTCALL(values[__pyx_temp]); } } __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyObject *__pyx_pf_54_cython_magic_5e3a7c0b9a77ebd65e99e24b10f8f056789e4a47_blur_cython(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_x, PyObject *__pyx_v_steps) { PyObject *__pyx_v_x_k = NULL; CYTHON_UNUSED PyObject *__pyx_v__ = NULL; PyObject *__pyx_v_i = NULL; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("blur_cython", 0); __Pyx_INCREF(__pyx_v_x); /* … */ /* function exit code */ __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_XDECREF(__pyx_t_2); __Pyx_XDECREF(__pyx_t_3); __Pyx_XDECREF(__pyx_t_9); __Pyx_XDECREF(__pyx_t_10); __Pyx_AddTraceback("_cython_magic_5e3a7c0b9a77ebd65e99e24b10f8f056789e4a47.blur_cython", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XDECREF(__pyx_v_x_k); __Pyx_XDECREF(__pyx_v__); __Pyx_XDECREF(__pyx_v_i); __Pyx_XDECREF(__pyx_v_x); __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* … */ __pyx_tuple__3 = PyTuple_Pack(5, __pyx_n_s_x, __pyx_n_s_steps, __pyx_n_s_x_k, __pyx_n_s__2, __pyx_n_s_i); if (unlikely(!__pyx_tuple__3)) __PYX_ERR(0, 5, __pyx_L1_error) __Pyx_GOTREF(__pyx_tuple__3); __Pyx_GIVEREF(__pyx_tuple__3); __pyx_codeobj__4 = (PyObject*)__Pyx_PyCode_New(2, 0, 0, 5, 0, CO_OPTIMIZED|CO_NEWLOCALS, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__3, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_Users_minrk_Library_Caches_ipyt, __pyx_n_s_blur_cython, 5, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__4)) __PYX_ERR(0, 5, __pyx_L1_error) /* … */ __pyx_t_2 = __Pyx_CyFunction_New(&__pyx_mdef_54_cython_magic_5e3a7c0b9a77ebd65e99e24b10f8f056789e4a47_1blur_cython, 0, __pyx_n_s_blur_cython, NULL, __pyx_n_s_cython_magic_5e3a7c0b9a77ebd65e, __pyx_d, ((PyObject *)__pyx_codeobj__4)); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 5, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); __Pyx_CyFunction_SetDefaultsTuple(__pyx_t_2, __pyx_tuple__5); if (PyDict_SetItem(__pyx_d, __pyx_n_s_blur_cython, __pyx_t_2) < 0) __PYX_ERR(0, 5, __pyx_L1_error) __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
+06: x = 1 * x # copy
__pyx_t_1 = __Pyx_PyInt_MultiplyCObj(__pyx_int_1, __pyx_v_x, 1, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 6, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); __Pyx_DECREF_SET(__pyx_v_x, __pyx_t_1); __pyx_t_1 = 0;
+07: x_k = np.empty_like(x)
__Pyx_GetModuleGlobalName(__pyx_t_2, __pyx_n_s_np); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_2, __pyx_n_s_empty_like); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 7, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; __pyx_t_2 = NULL; __pyx_t_4 = 0; #if CYTHON_UNPACK_METHODS if (unlikely(PyMethod_Check(__pyx_t_3))) { __pyx_t_2 = PyMethod_GET_SELF(__pyx_t_3); if (likely(__pyx_t_2)) { PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3); __Pyx_INCREF(__pyx_t_2); __Pyx_INCREF(function); __Pyx_DECREF_SET(__pyx_t_3, function); __pyx_t_4 = 1; } } #endif { PyObject *__pyx_callargs[2] = {__pyx_t_2, __pyx_v_x}; __pyx_t_1 = __Pyx_PyObject_FastCall(__pyx_t_3, __pyx_callargs+1-__pyx_t_4, 1+__pyx_t_4); __Pyx_XDECREF(__pyx_t_2); __pyx_t_2 = 0; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; } __pyx_v_x_k = __pyx_t_1; __pyx_t_1 = 0;
+08: x_k[0] = x[0]
__pyx_t_1 = __Pyx_GetItemInt(__pyx_v_x, 0, long, 1, __Pyx_PyInt_From_long, 0, 0, 1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 8, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (unlikely((__Pyx_SetItemInt(__pyx_v_x_k, 0, __pyx_t_1, long, 1, __Pyx_PyInt_From_long, 0, 0, 1) < 0))) __PYX_ERR(0, 8, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
+09: x_k[-1] = x[-1]
__pyx_t_1 = __Pyx_GetItemInt(__pyx_v_x, -1L, long, 1, __Pyx_PyInt_From_long, 0, 1, 1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (unlikely((__Pyx_SetItemInt(__pyx_v_x_k, -1L, __pyx_t_1, long, 1, __Pyx_PyInt_From_long, 0, 1, 1) < 0))) __PYX_ERR(0, 9, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
+10: for _ in range(steps):
__pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_builtin_range, __pyx_v_steps); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (likely(PyList_CheckExact(__pyx_t_1)) || PyTuple_CheckExact(__pyx_t_1)) { __pyx_t_3 = __pyx_t_1; __Pyx_INCREF(__pyx_t_3); __pyx_t_5 = 0; __pyx_t_6 = NULL; } else { __pyx_t_5 = -1; __pyx_t_3 = PyObject_GetIter(__pyx_t_1); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __pyx_t_6 = __Pyx_PyObject_GetIterNextFunc(__pyx_t_3); if (unlikely(!__pyx_t_6)) __PYX_ERR(0, 10, __pyx_L1_error) } __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; for (;;) { if (likely(!__pyx_t_6)) { if (likely(PyList_CheckExact(__pyx_t_3))) { if (__pyx_t_5 >= PyList_GET_SIZE(__pyx_t_3)) break; #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS __pyx_t_1 = PyList_GET_ITEM(__pyx_t_3, __pyx_t_5); __Pyx_INCREF(__pyx_t_1); __pyx_t_5++; if (unlikely((0 < 0))) __PYX_ERR(0, 10, __pyx_L1_error) #else __pyx_t_1 = PySequence_ITEM(__pyx_t_3, __pyx_t_5); __pyx_t_5++; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); #endif } else { if (__pyx_t_5 >= PyTuple_GET_SIZE(__pyx_t_3)) break; #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS __pyx_t_1 = PyTuple_GET_ITEM(__pyx_t_3, __pyx_t_5); __Pyx_INCREF(__pyx_t_1); __pyx_t_5++; if (unlikely((0 < 0))) __PYX_ERR(0, 10, __pyx_L1_error) #else __pyx_t_1 = PySequence_ITEM(__pyx_t_3, __pyx_t_5); __pyx_t_5++; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); #endif } } else { __pyx_t_1 = __pyx_t_6(__pyx_t_3); if (unlikely(!__pyx_t_1)) { PyObject* exc_type = PyErr_Occurred(); if (exc_type) { if (likely(__Pyx_PyErr_GivenExceptionMatches(exc_type, PyExc_StopIteration))) PyErr_Clear(); else __PYX_ERR(0, 10, __pyx_L1_error) } break; } __Pyx_GOTREF(__pyx_t_1); } __Pyx_XDECREF_SET(__pyx_v__, __pyx_t_1); __pyx_t_1 = 0; /* … */ } __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
+11: for i in range(1, len(x) - 1):
__pyx_t_7 = PyObject_Length(__pyx_v_x); if (unlikely(__pyx_t_7 == ((Py_ssize_t)-1))) __PYX_ERR(0, 11, __pyx_L1_error) __pyx_t_1 = PyInt_FromSsize_t((__pyx_t_7 - 1)); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 11, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); __pyx_t_2 = PyTuple_New(2); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 11, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); __Pyx_INCREF(__pyx_int_1); __Pyx_GIVEREF(__pyx_int_1); if (__Pyx_PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_int_1)) __PYX_ERR(0, 11, __pyx_L1_error); __Pyx_GIVEREF(__pyx_t_1); if (__Pyx_PyTuple_SET_ITEM(__pyx_t_2, 1, __pyx_t_1)) __PYX_ERR(0, 11, __pyx_L1_error); __pyx_t_1 = 0; __pyx_t_1 = __Pyx_PyObject_Call(__pyx_builtin_range, __pyx_t_2, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 11, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; if (likely(PyList_CheckExact(__pyx_t_1)) || PyTuple_CheckExact(__pyx_t_1)) { __pyx_t_2 = __pyx_t_1; __Pyx_INCREF(__pyx_t_2); __pyx_t_7 = 0; __pyx_t_8 = NULL; } else { __pyx_t_7 = -1; __pyx_t_2 = PyObject_GetIter(__pyx_t_1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 11, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); __pyx_t_8 = __Pyx_PyObject_GetIterNextFunc(__pyx_t_2); if (unlikely(!__pyx_t_8)) __PYX_ERR(0, 11, __pyx_L1_error) } __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; for (;;) { if (likely(!__pyx_t_8)) { if (likely(PyList_CheckExact(__pyx_t_2))) { if (__pyx_t_7 >= PyList_GET_SIZE(__pyx_t_2)) break; #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS __pyx_t_1 = PyList_GET_ITEM(__pyx_t_2, __pyx_t_7); __Pyx_INCREF(__pyx_t_1); __pyx_t_7++; if (unlikely((0 < 0))) __PYX_ERR(0, 11, __pyx_L1_error) #else __pyx_t_1 = PySequence_ITEM(__pyx_t_2, __pyx_t_7); __pyx_t_7++; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 11, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); #endif } else { if (__pyx_t_7 >= PyTuple_GET_SIZE(__pyx_t_2)) break; #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS __pyx_t_1 = PyTuple_GET_ITEM(__pyx_t_2, __pyx_t_7); __Pyx_INCREF(__pyx_t_1); __pyx_t_7++; if (unlikely((0 < 0))) __PYX_ERR(0, 11, __pyx_L1_error) #else __pyx_t_1 = PySequence_ITEM(__pyx_t_2, __pyx_t_7); __pyx_t_7++; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 11, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); #endif } } else { __pyx_t_1 = __pyx_t_8(__pyx_t_2); if (unlikely(!__pyx_t_1)) { PyObject* exc_type = PyErr_Occurred(); if (exc_type) { if (likely(__Pyx_PyErr_GivenExceptionMatches(exc_type, PyExc_StopIteration))) PyErr_Clear(); else __PYX_ERR(0, 11, __pyx_L1_error) } break; } __Pyx_GOTREF(__pyx_t_1); } __Pyx_XDECREF_SET(__pyx_v_i, __pyx_t_1); __pyx_t_1 = 0; /* … */ } __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
+12: x_k[i] = 0.25 * (x[i - 1] + 2 * x[i] + x[i + 1])
__pyx_t_1 = __Pyx_PyInt_SubtractObjC(__pyx_v_i, __pyx_int_1, 1, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 12, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); __pyx_t_9 = __Pyx_PyObject_GetItem(__pyx_v_x, __pyx_t_1); if (unlikely(!__pyx_t_9)) __PYX_ERR(0, 12, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_9); __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __pyx_t_1 = __Pyx_PyObject_GetItem(__pyx_v_x, __pyx_v_i); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 12, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); __pyx_t_10 = __Pyx_PyInt_MultiplyCObj(__pyx_int_2, __pyx_t_1, 2, 0, 0); if (unlikely(!__pyx_t_10)) __PYX_ERR(0, 12, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_10); __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __pyx_t_1 = PyNumber_Add(__pyx_t_9, __pyx_t_10); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 12, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); __Pyx_DECREF(__pyx_t_9); __pyx_t_9 = 0; __Pyx_DECREF(__pyx_t_10); __pyx_t_10 = 0; __pyx_t_10 = __Pyx_PyInt_AddObjC(__pyx_v_i, __pyx_int_1, 1, 0, 0); if (unlikely(!__pyx_t_10)) __PYX_ERR(0, 12, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_10); __pyx_t_9 = __Pyx_PyObject_GetItem(__pyx_v_x, __pyx_t_10); if (unlikely(!__pyx_t_9)) __PYX_ERR(0, 12, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_9); __Pyx_DECREF(__pyx_t_10); __pyx_t_10 = 0; __pyx_t_10 = PyNumber_Add(__pyx_t_1, __pyx_t_9); if (unlikely(!__pyx_t_10)) __PYX_ERR(0, 12, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_10); __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __Pyx_DECREF(__pyx_t_9); __pyx_t_9 = 0; __pyx_t_9 = PyNumber_Multiply(__pyx_float_0_25, __pyx_t_10); if (unlikely(!__pyx_t_9)) __PYX_ERR(0, 12, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_9); __Pyx_DECREF(__pyx_t_10); __pyx_t_10 = 0; if (unlikely((PyObject_SetItem(__pyx_v_x_k, __pyx_v_i, __pyx_t_9) < 0))) __PYX_ERR(0, 12, __pyx_L1_error) __Pyx_DECREF(__pyx_t_9); __pyx_t_9 = 0;
+13: x, x_k = x_k, x # swap for next step
__pyx_t_11 = __pyx_v_x_k; __pyx_t_12 = __pyx_v_x; __pyx_v_x = __pyx_t_11; __pyx_t_11 = 0; __pyx_v_x_k = __pyx_t_12; __pyx_t_12 = 0;
+14: return x
__Pyx_XDECREF(__pyx_r); __Pyx_INCREF(__pyx_v_x); __pyx_r = __pyx_v_x; goto __pyx_L0;
c1 = %timeit -o y = blur_cython(x, steps)
t_c1 = c1.best
times["cython (no hints)"] = t_c1
2.85 s ± 29.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
plot_times()
Without annotations, we don’t get any real improvement over the pure Python version. We can note the types of the input arguments, to get some improvements:
%%cython --annotate
import cython as C
import numpy as np
def blur_cython2(x: C.double[:], steps: C.int=1024):
x = x.copy()
x_k = np.empty_like(x)
x_k[0] = x[0]
x_k[-1] = x[-1]
i: C.size_t
N: C.size_t = len(x)
for _ in range(steps):
for i in range(1, N-1):
x_k[i] = .25 * ( x[i-1] + 2 * x[i] + x[i+1] )
x, x_k = x_k, x # swap for next step
return np.asarray(x)
Generated by Cython 3.0.2
Yellow lines hint at Python interaction.
Click on a line that starts with a "+
" to see the C code that Cython generated for it.
01:
+02: import cython as C
__pyx_t_5 = __Pyx_PyDict_NewPresized(0); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_5); if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_5) < 0) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
+03: import numpy as np
__pyx_t_7 = __Pyx_ImportDottedModule(__pyx_n_s_numpy, NULL); if (unlikely(!__pyx_t_7)) __PYX_ERR(0, 3, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_7); if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_7) < 0) __PYX_ERR(0, 3, __pyx_L1_error) __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0;
04:
+05: def blur_cython2(x: C.double[:], steps: C.int=1024):
/* Python wrapper */ static PyObject *__pyx_pw_54_cython_magic_77a0cbe9819d4eb3494c654665b559b8f2f401db_1blur_cython2(PyObject *__pyx_self, #if CYTHON_METH_FASTCALL PyObject *const *__pyx_args, Py_ssize_t __pyx_nargs, PyObject *__pyx_kwds #else PyObject *__pyx_args, PyObject *__pyx_kwds #endif ); /*proto*/ static PyMethodDef __pyx_mdef_54_cython_magic_77a0cbe9819d4eb3494c654665b559b8f2f401db_1blur_cython2 = {"blur_cython2", (PyCFunction)(void*)(__Pyx_PyCFunction_FastCallWithKeywords)__pyx_pw_54_cython_magic_77a0cbe9819d4eb3494c654665b559b8f2f401db_1blur_cython2, __Pyx_METH_FASTCALL|METH_KEYWORDS, 0}; static PyObject *__pyx_pw_54_cython_magic_77a0cbe9819d4eb3494c654665b559b8f2f401db_1blur_cython2(PyObject *__pyx_self, #if CYTHON_METH_FASTCALL PyObject *const *__pyx_args, Py_ssize_t __pyx_nargs, PyObject *__pyx_kwds #else PyObject *__pyx_args, PyObject *__pyx_kwds #endif ) { __Pyx_memviewslice __pyx_v_x = { 0, 0, { 0 }, { 0 }, { 0 } }; int __pyx_v_steps; #if !CYTHON_METH_FASTCALL CYTHON_UNUSED Py_ssize_t __pyx_nargs; #endif CYTHON_UNUSED PyObject *const *__pyx_kwvalues; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("blur_cython2 (wrapper)", 0); #if !CYTHON_METH_FASTCALL #if CYTHON_ASSUME_SAFE_MACROS __pyx_nargs = PyTuple_GET_SIZE(__pyx_args); #else __pyx_nargs = PyTuple_Size(__pyx_args); if (unlikely((__pyx_nargs < 0))) __PYX_ERR(0, 5, __pyx_L3_error) #endif #endif __pyx_kwvalues = __Pyx_KwValues_FASTCALL(__pyx_args, __pyx_nargs); { PyObject **__pyx_pyargnames[] = {&__pyx_n_s_x,&__pyx_n_s_steps,0}; PyObject* values[2] = {0,0}; if (__pyx_kwds) { Py_ssize_t kw_args; switch (__pyx_nargs) { case 2: values[1] = __Pyx_Arg_FASTCALL(__pyx_args, 1); CYTHON_FALLTHROUGH; case 1: values[0] = __Pyx_Arg_FASTCALL(__pyx_args, 0); CYTHON_FALLTHROUGH; case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = __Pyx_NumKwargs_FASTCALL(__pyx_kwds); switch (__pyx_nargs) { case 0: if (likely((values[0] = __Pyx_GetKwValue_FASTCALL(__pyx_kwds, __pyx_kwvalues, __pyx_n_s_x)) != 0)) { (void)__Pyx_Arg_NewRef_FASTCALL(values[0]); kw_args--; } else if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 5, __pyx_L3_error) else goto __pyx_L5_argtuple_error; CYTHON_FALLTHROUGH; case 1: if (kw_args > 0) { PyObject* value = __Pyx_GetKwValue_FASTCALL(__pyx_kwds, __pyx_kwvalues, __pyx_n_s_steps); if (value) { values[1] = __Pyx_Arg_NewRef_FASTCALL(value); kw_args--; } else if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 5, __pyx_L3_error) } } if (unlikely(kw_args > 0)) { const Py_ssize_t kwd_pos_args = __pyx_nargs; if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_kwvalues, __pyx_pyargnames, 0, values + 0, kwd_pos_args, "blur_cython2") < 0)) __PYX_ERR(0, 5, __pyx_L3_error) } } else { switch (__pyx_nargs) { case 2: values[1] = __Pyx_Arg_FASTCALL(__pyx_args, 1); CYTHON_FALLTHROUGH; case 1: values[0] = __Pyx_Arg_FASTCALL(__pyx_args, 0); break; default: goto __pyx_L5_argtuple_error; } } __pyx_v_x = __Pyx_PyObject_to_MemoryviewSlice_ds_double(values[0], PyBUF_WRITABLE); if (unlikely(!__pyx_v_x.memview)) __PYX_ERR(0, 5, __pyx_L3_error) if (values[1]) { __pyx_v_steps = __Pyx_PyInt_As_int(values[1]); if (unlikely((__pyx_v_steps == (int)-1) && PyErr_Occurred())) __PYX_ERR(0, 5, __pyx_L3_error) } else { __pyx_v_steps = ((int)((int)0x400)); } } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("blur_cython2", 0, 1, 2, __pyx_nargs); __PYX_ERR(0, 5, __pyx_L3_error) goto __pyx_L3_error; __pyx_L3_error:; { Py_ssize_t __pyx_temp; for (__pyx_temp=0; __pyx_temp < (Py_ssize_t)(sizeof(values)/sizeof(values[0])); ++__pyx_temp) { __Pyx_Arg_XDECREF_FASTCALL(values[__pyx_temp]); } } __PYX_XCLEAR_MEMVIEW(&__pyx_v_x, 1); __Pyx_AddTraceback("_cython_magic_77a0cbe9819d4eb3494c654665b559b8f2f401db.blur_cython2", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; if (unlikely(((PyObject *)__pyx_v_x.memview) == Py_None)) { PyErr_Format(PyExc_TypeError, "Argument '%.200s' must not be None", "x"); __PYX_ERR(0, 5, __pyx_L1_error) } __pyx_r = __pyx_pf_54_cython_magic_77a0cbe9819d4eb3494c654665b559b8f2f401db_blur_cython2(__pyx_self, __pyx_v_x, __pyx_v_steps); int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; /* function exit code */ goto __pyx_L0; __pyx_L1_error:; __pyx_r = NULL; __pyx_L0:; __PYX_XCLEAR_MEMVIEW(&__pyx_v_x, 1); { Py_ssize_t __pyx_temp; for (__pyx_temp=0; __pyx_temp < (Py_ssize_t)(sizeof(values)/sizeof(values[0])); ++__pyx_temp) { __Pyx_Arg_XDECREF_FASTCALL(values[__pyx_temp]); } } __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyObject *__pyx_pf_54_cython_magic_77a0cbe9819d4eb3494c654665b559b8f2f401db_blur_cython2(CYTHON_UNUSED PyObject *__pyx_self, __Pyx_memviewslice __pyx_v_x, int __pyx_v_steps) { PyObject *__pyx_v_x_k = NULL; size_t __pyx_v_i; size_t __pyx_v_N; CYTHON_UNUSED int __pyx_v__; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("blur_cython2", 0); __PYX_INC_MEMVIEW(&__pyx_v_x, 1); /* … */ /* function exit code */ __pyx_L1_error:; __PYX_XCLEAR_MEMVIEW(&__pyx_t_1, 1); __Pyx_XDECREF(__pyx_t_2); __Pyx_XDECREF(__pyx_t_3); __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_5); __PYX_XCLEAR_MEMVIEW(&__pyx_t_18, 1); __Pyx_AddTraceback("_cython_magic_77a0cbe9819d4eb3494c654665b559b8f2f401db.blur_cython2", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XDECREF(__pyx_v_x_k); __PYX_XCLEAR_MEMVIEW(&__pyx_v_x, 1); __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* … */ __pyx_tuple__21 = PyTuple_Pack(6, __pyx_n_s_x, __pyx_n_s_steps, __pyx_n_s_x_k, __pyx_n_s_i, __pyx_n_s_N, __pyx_n_s__20); if (unlikely(!__pyx_tuple__21)) __PYX_ERR(0, 5, __pyx_L1_error) __Pyx_GOTREF(__pyx_tuple__21); __Pyx_GIVEREF(__pyx_tuple__21); /* … */ __pyx_t_7 = __Pyx_PyInt_From_int(((int)0x400)); if (unlikely(!__pyx_t_7)) __PYX_ERR(0, 5, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_7); __pyx_t_4 = PyTuple_New(1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 5, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __Pyx_GIVEREF(__pyx_t_7); if (__Pyx_PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_7)) __PYX_ERR(0, 5, __pyx_L1_error); __pyx_t_7 = 0; __pyx_t_7 = __Pyx_PyDict_NewPresized(2); if (unlikely(!__pyx_t_7)) __PYX_ERR(0, 5, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_7); if (PyDict_SetItem(__pyx_t_7, __pyx_n_s_x, __pyx_kp_s_C_double) < 0) __PYX_ERR(0, 5, __pyx_L1_error) if (PyDict_SetItem(__pyx_t_7, __pyx_n_s_steps, __pyx_kp_s_C_int) < 0) __PYX_ERR(0, 5, __pyx_L1_error) __pyx_t_5 = __Pyx_CyFunction_New(&__pyx_mdef_54_cython_magic_77a0cbe9819d4eb3494c654665b559b8f2f401db_1blur_cython2, 0, __pyx_n_s_blur_cython2, NULL, __pyx_n_s_cython_magic_77a0cbe9819d4eb349, __pyx_d, ((PyObject *)__pyx_codeobj__22)); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 5, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_5); __Pyx_CyFunction_SetDefaultsTuple(__pyx_t_5, __pyx_t_4); __Pyx_CyFunction_SetAnnotationsDict(__pyx_t_5, __pyx_t_7); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0; if (PyDict_SetItem(__pyx_d, __pyx_n_s_blur_cython2, __pyx_t_5) < 0) __PYX_ERR(0, 5, __pyx_L1_error) __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
+06: x = x.copy()
__pyx_t_1 = __pyx_memoryview_copy_slice_dc_double_c(__pyx_v_x); if (unlikely(!__pyx_t_1.memview)) __PYX_ERR(0, 6, __pyx_L1_error)
__PYX_XCLEAR_MEMVIEW(&__pyx_v_x, 1);
__pyx_v_x = __pyx_t_1;
__pyx_t_1.memview = NULL;
__pyx_t_1.data = NULL;
+07: x_k = np.empty_like(x)
__Pyx_GetModuleGlobalName(__pyx_t_3, __pyx_n_s_np); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 7, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __pyx_t_4 = __Pyx_PyObject_GetAttrStr(__pyx_t_3, __pyx_n_s_empty_like); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 7, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_3 = __pyx_memoryview_fromslice(__pyx_v_x, 1, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 7, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = NULL; __pyx_t_6 = 0; #if CYTHON_UNPACK_METHODS if (unlikely(PyMethod_Check(__pyx_t_4))) { __pyx_t_5 = PyMethod_GET_SELF(__pyx_t_4); if (likely(__pyx_t_5)) { PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_4); __Pyx_INCREF(__pyx_t_5); __Pyx_INCREF(function); __Pyx_DECREF_SET(__pyx_t_4, function); __pyx_t_6 = 1; } } #endif { PyObject *__pyx_callargs[2] = {__pyx_t_5, __pyx_t_3}; __pyx_t_2 = __Pyx_PyObject_FastCall(__pyx_t_4, __pyx_callargs+1-__pyx_t_6, 1+__pyx_t_6); __Pyx_XDECREF(__pyx_t_5); __pyx_t_5 = 0; __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; } __pyx_v_x_k = __pyx_t_2; __pyx_t_2 = 0;
+08: x_k[0] = x[0]
__pyx_t_7 = 0; __pyx_t_6 = -1; if (__pyx_t_7 < 0) { __pyx_t_7 += __pyx_v_x.shape[0]; if (unlikely(__pyx_t_7 < 0)) __pyx_t_6 = 0; } else if (unlikely(__pyx_t_7 >= __pyx_v_x.shape[0])) __pyx_t_6 = 0; if (unlikely(__pyx_t_6 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_6); __PYX_ERR(0, 8, __pyx_L1_error) } __pyx_t_2 = PyFloat_FromDouble((*((double *) ( /* dim=0 */ (__pyx_v_x.data + __pyx_t_7 * __pyx_v_x.strides[0]) )))); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 8, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); if (unlikely((__Pyx_SetItemInt(__pyx_v_x_k, 0, __pyx_t_2, long, 1, __Pyx_PyInt_From_long, 0, 0, 1) < 0))) __PYX_ERR(0, 8, __pyx_L1_error) __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
+09: x_k[-1] = x[-1]
__pyx_t_7 = -1L; __pyx_t_6 = -1; if (__pyx_t_7 < 0) { __pyx_t_7 += __pyx_v_x.shape[0]; if (unlikely(__pyx_t_7 < 0)) __pyx_t_6 = 0; } else if (unlikely(__pyx_t_7 >= __pyx_v_x.shape[0])) __pyx_t_6 = 0; if (unlikely(__pyx_t_6 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_6); __PYX_ERR(0, 9, __pyx_L1_error) } __pyx_t_2 = PyFloat_FromDouble((*((double *) ( /* dim=0 */ (__pyx_v_x.data + __pyx_t_7 * __pyx_v_x.strides[0]) )))); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 9, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); if (unlikely((__Pyx_SetItemInt(__pyx_v_x_k, -1L, __pyx_t_2, long, 1, __Pyx_PyInt_From_long, 0, 1, 1) < 0))) __PYX_ERR(0, 9, __pyx_L1_error) __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
10: i: C.size_t
+11: N: C.size_t = len(x)
__pyx_t_8 = __Pyx_MemoryView_Len(__pyx_v_x);
__pyx_v_N = __pyx_t_8;
+12: for _ in range(steps):
__pyx_t_6 = __pyx_v_steps; __pyx_t_9 = __pyx_t_6; for (__pyx_t_10 = 0; __pyx_t_10 < __pyx_t_9; __pyx_t_10+=1) { __pyx_v__ = __pyx_t_10;
+13: for i in range(1, N-1):
__pyx_t_11 = (__pyx_v_N - 1); __pyx_t_12 = __pyx_t_11; for (__pyx_t_13 = 1; __pyx_t_13 < __pyx_t_12; __pyx_t_13+=1) { __pyx_v_i = __pyx_t_13;
+14: x_k[i] = .25 * ( x[i-1] + 2 * x[i] + x[i+1] )
__pyx_t_14 = (__pyx_v_i - 1); __pyx_t_15 = -1; if (unlikely(__pyx_t_14 >= (size_t)__pyx_v_x.shape[0])) __pyx_t_15 = 0; if (unlikely(__pyx_t_15 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_15); __PYX_ERR(0, 14, __pyx_L1_error) } __pyx_t_16 = __pyx_v_i; __pyx_t_15 = -1; if (unlikely(__pyx_t_16 >= (size_t)__pyx_v_x.shape[0])) __pyx_t_15 = 0; if (unlikely(__pyx_t_15 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_15); __PYX_ERR(0, 14, __pyx_L1_error) } __pyx_t_17 = (__pyx_v_i + 1); __pyx_t_15 = -1; if (unlikely(__pyx_t_17 >= (size_t)__pyx_v_x.shape[0])) __pyx_t_15 = 0; if (unlikely(__pyx_t_15 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_15); __PYX_ERR(0, 14, __pyx_L1_error) } __pyx_t_2 = PyFloat_FromDouble((.25 * (((*((double *) ( /* dim=0 */ (__pyx_v_x.data + __pyx_t_14 * __pyx_v_x.strides[0]) ))) + (2.0 * (*((double *) ( /* dim=0 */ (__pyx_v_x.data + __pyx_t_16 * __pyx_v_x.strides[0]) ))))) + (*((double *) ( /* dim=0 */ (__pyx_v_x.data + __pyx_t_17 * __pyx_v_x.strides[0]) )))))); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 14, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); if (unlikely((__Pyx_SetItemInt(__pyx_v_x_k, __pyx_v_i, __pyx_t_2, size_t, 0, __Pyx_PyInt_FromSize_t, 0, 0, 1) < 0))) __PYX_ERR(0, 14, __pyx_L1_error) __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; }
+15: x, x_k = x_k, x # swap for next step
__pyx_t_18 = __Pyx_PyObject_to_MemoryviewSlice_ds_double(__pyx_v_x_k, PyBUF_WRITABLE); if (unlikely(!__pyx_t_18.memview)) __PYX_ERR(0, 15, __pyx_L1_error) __pyx_t_2 = __pyx_memoryview_fromslice(__pyx_v_x, 1, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 15, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); __PYX_XCLEAR_MEMVIEW(&__pyx_v_x, 1); __pyx_v_x = __pyx_t_18; __pyx_t_18.memview = NULL; __pyx_t_18.data = NULL; __Pyx_DECREF_SET(__pyx_v_x_k, __pyx_t_2); __pyx_t_2 = 0; }
+16: return np.asarray(x)
__Pyx_XDECREF(__pyx_r); __Pyx_GetModuleGlobalName(__pyx_t_4, __pyx_n_s_np); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 16, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_4, __pyx_n_s_asarray); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 16, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __pyx_t_4 = __pyx_memoryview_fromslice(__pyx_v_x, 1, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 16, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __pyx_t_5 = NULL; __pyx_t_6 = 0; #if CYTHON_UNPACK_METHODS if (unlikely(PyMethod_Check(__pyx_t_3))) { __pyx_t_5 = PyMethod_GET_SELF(__pyx_t_3); if (likely(__pyx_t_5)) { PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3); __Pyx_INCREF(__pyx_t_5); __Pyx_INCREF(function); __Pyx_DECREF_SET(__pyx_t_3, function); __pyx_t_6 = 1; } } #endif { PyObject *__pyx_callargs[2] = {__pyx_t_5, __pyx_t_4}; __pyx_t_2 = __Pyx_PyObject_FastCall(__pyx_t_3, __pyx_callargs+1-__pyx_t_6, 1+__pyx_t_6); __Pyx_XDECREF(__pyx_t_5); __pyx_t_5 = 0; __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 16, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; } __pyx_r = __pyx_t_2; __pyx_t_2 = 0; goto __pyx_L0;
c2 = %timeit -o blur_cython2(x, steps)
t_c2 = c2.best
times["cython (loops)"] = t_c2
plot_times()
885 ms ± 10.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Just by making sure the iteration variables are defined as integers, we can save about 80% of the time.
The biggest key to optimizing with Cython is getting that yellow out of your loops.
The more deeply nested a bit of code is within a loop,
the more often it is called, and the more value you can get out of making it fast.
In Cython, fast means avoiding Python (getting rid of yellow).
To get rid of Python calls, we need to tell Python about the numpy arrays x
and y
:
%%cython --annotate
import cython as C
import numpy as np
def blur_cython_typed(x_: C.double[::1], steps: C.int = 1024):
i: C.size_t
N: C.size_t = x_.shape[0]
x: C.double[::1] = x_.copy()
x_k: C.double[::1] = np.empty_like(x_)
x_k[0] = x[0]
x_k[N - 1] = x[N - 1]
for _ in range(steps):
for i in range(1, N - 1):
x_k[i] = 0.25 * (x[i - 1] + 2 * x[i] + x[i + 1])
x, x_k = x_k, x # swap for next step
return np.asarray(x)
Generated by Cython 3.0.2
Yellow lines hint at Python interaction.
Click on a line that starts with a "+
" to see the C code that Cython generated for it.
01:
+02: import cython as C
__pyx_t_5 = __Pyx_PyDict_NewPresized(0); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_5); if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_5) < 0) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
+03: import numpy as np
__pyx_t_7 = __Pyx_ImportDottedModule(__pyx_n_s_numpy, NULL); if (unlikely(!__pyx_t_7)) __PYX_ERR(0, 3, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_7); if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_7) < 0) __PYX_ERR(0, 3, __pyx_L1_error) __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0;
04:
05:
+06: def blur_cython_typed(x_: C.double[::1], steps: C.int = 1024):
/* Python wrapper */ static PyObject *__pyx_pw_54_cython_magic_5e8733fa9800c350e490ea003224a0ea9bb4060a_1blur_cython_typed(PyObject *__pyx_self, #if CYTHON_METH_FASTCALL PyObject *const *__pyx_args, Py_ssize_t __pyx_nargs, PyObject *__pyx_kwds #else PyObject *__pyx_args, PyObject *__pyx_kwds #endif ); /*proto*/ static PyMethodDef __pyx_mdef_54_cython_magic_5e8733fa9800c350e490ea003224a0ea9bb4060a_1blur_cython_typed = {"blur_cython_typed", (PyCFunction)(void*)(__Pyx_PyCFunction_FastCallWithKeywords)__pyx_pw_54_cython_magic_5e8733fa9800c350e490ea003224a0ea9bb4060a_1blur_cython_typed, __Pyx_METH_FASTCALL|METH_KEYWORDS, 0}; static PyObject *__pyx_pw_54_cython_magic_5e8733fa9800c350e490ea003224a0ea9bb4060a_1blur_cython_typed(PyObject *__pyx_self, #if CYTHON_METH_FASTCALL PyObject *const *__pyx_args, Py_ssize_t __pyx_nargs, PyObject *__pyx_kwds #else PyObject *__pyx_args, PyObject *__pyx_kwds #endif ) { __Pyx_memviewslice __pyx_v_x_ = { 0, 0, { 0 }, { 0 }, { 0 } }; int __pyx_v_steps; #if !CYTHON_METH_FASTCALL CYTHON_UNUSED Py_ssize_t __pyx_nargs; #endif CYTHON_UNUSED PyObject *const *__pyx_kwvalues; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("blur_cython_typed (wrapper)", 0); #if !CYTHON_METH_FASTCALL #if CYTHON_ASSUME_SAFE_MACROS __pyx_nargs = PyTuple_GET_SIZE(__pyx_args); #else __pyx_nargs = PyTuple_Size(__pyx_args); if (unlikely((__pyx_nargs < 0))) __PYX_ERR(0, 6, __pyx_L3_error) #endif #endif __pyx_kwvalues = __Pyx_KwValues_FASTCALL(__pyx_args, __pyx_nargs); { PyObject **__pyx_pyargnames[] = {&__pyx_n_s_x,&__pyx_n_s_steps,0}; PyObject* values[2] = {0,0}; if (__pyx_kwds) { Py_ssize_t kw_args; switch (__pyx_nargs) { case 2: values[1] = __Pyx_Arg_FASTCALL(__pyx_args, 1); CYTHON_FALLTHROUGH; case 1: values[0] = __Pyx_Arg_FASTCALL(__pyx_args, 0); CYTHON_FALLTHROUGH; case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = __Pyx_NumKwargs_FASTCALL(__pyx_kwds); switch (__pyx_nargs) { case 0: if (likely((values[0] = __Pyx_GetKwValue_FASTCALL(__pyx_kwds, __pyx_kwvalues, __pyx_n_s_x)) != 0)) { (void)__Pyx_Arg_NewRef_FASTCALL(values[0]); kw_args--; } else if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 6, __pyx_L3_error) else goto __pyx_L5_argtuple_error; CYTHON_FALLTHROUGH; case 1: if (kw_args > 0) { PyObject* value = __Pyx_GetKwValue_FASTCALL(__pyx_kwds, __pyx_kwvalues, __pyx_n_s_steps); if (value) { values[1] = __Pyx_Arg_NewRef_FASTCALL(value); kw_args--; } else if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 6, __pyx_L3_error) } } if (unlikely(kw_args > 0)) { const Py_ssize_t kwd_pos_args = __pyx_nargs; if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_kwvalues, __pyx_pyargnames, 0, values + 0, kwd_pos_args, "blur_cython_typed") < 0)) __PYX_ERR(0, 6, __pyx_L3_error) } } else { switch (__pyx_nargs) { case 2: values[1] = __Pyx_Arg_FASTCALL(__pyx_args, 1); CYTHON_FALLTHROUGH; case 1: values[0] = __Pyx_Arg_FASTCALL(__pyx_args, 0); break; default: goto __pyx_L5_argtuple_error; } } __pyx_v_x_ = __Pyx_PyObject_to_MemoryviewSlice_dc_double(values[0], PyBUF_WRITABLE); if (unlikely(!__pyx_v_x_.memview)) __PYX_ERR(0, 6, __pyx_L3_error) if (values[1]) { __pyx_v_steps = __Pyx_PyInt_As_int(values[1]); if (unlikely((__pyx_v_steps == (int)-1) && PyErr_Occurred())) __PYX_ERR(0, 6, __pyx_L3_error) } else { __pyx_v_steps = ((int)((int)0x400)); } } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("blur_cython_typed", 0, 1, 2, __pyx_nargs); __PYX_ERR(0, 6, __pyx_L3_error) goto __pyx_L3_error; __pyx_L3_error:; { Py_ssize_t __pyx_temp; for (__pyx_temp=0; __pyx_temp < (Py_ssize_t)(sizeof(values)/sizeof(values[0])); ++__pyx_temp) { __Pyx_Arg_XDECREF_FASTCALL(values[__pyx_temp]); } } __PYX_XCLEAR_MEMVIEW(&__pyx_v_x_, 1); __Pyx_AddTraceback("_cython_magic_5e8733fa9800c350e490ea003224a0ea9bb4060a.blur_cython_typed", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; if (unlikely(((PyObject *)__pyx_v_x_.memview) == Py_None)) { PyErr_Format(PyExc_TypeError, "Argument '%.200s' must not be None", "x_"); __PYX_ERR(0, 6, __pyx_L1_error) } __pyx_r = __pyx_pf_54_cython_magic_5e8733fa9800c350e490ea003224a0ea9bb4060a_blur_cython_typed(__pyx_self, __pyx_v_x_, __pyx_v_steps); int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; /* function exit code */ goto __pyx_L0; __pyx_L1_error:; __pyx_r = NULL; __pyx_L0:; __PYX_XCLEAR_MEMVIEW(&__pyx_v_x_, 1); { Py_ssize_t __pyx_temp; for (__pyx_temp=0; __pyx_temp < (Py_ssize_t)(sizeof(values)/sizeof(values[0])); ++__pyx_temp) { __Pyx_Arg_XDECREF_FASTCALL(values[__pyx_temp]); } } __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyObject *__pyx_pf_54_cython_magic_5e8733fa9800c350e490ea003224a0ea9bb4060a_blur_cython_typed(CYTHON_UNUSED PyObject *__pyx_self, __Pyx_memviewslice __pyx_v_x_, int __pyx_v_steps) { size_t __pyx_v_i; size_t __pyx_v_N; __Pyx_memviewslice __pyx_v_x = { 0, 0, { 0 }, { 0 }, { 0 } }; __Pyx_memviewslice __pyx_v_x_k = { 0, 0, { 0 }, { 0 }, { 0 } }; CYTHON_UNUSED int __pyx_v__; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("blur_cython_typed", 0); /* … */ /* function exit code */ __pyx_L1_error:; __PYX_XCLEAR_MEMVIEW(&__pyx_t_1, 1); __Pyx_XDECREF(__pyx_t_2); __Pyx_XDECREF(__pyx_t_3); __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_5); __PYX_XCLEAR_MEMVIEW(&__pyx_t_19, 1); __Pyx_AddTraceback("_cython_magic_5e8733fa9800c350e490ea003224a0ea9bb4060a.blur_cython_typed", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __PYX_XCLEAR_MEMVIEW(&__pyx_v_x, 1); __PYX_XCLEAR_MEMVIEW(&__pyx_v_x_k, 1); __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* … */ __pyx_tuple__21 = PyTuple_Pack(7, __pyx_n_s_x, __pyx_n_s_steps, __pyx_n_s_i, __pyx_n_s_N, __pyx_n_s_x_2, __pyx_n_s_x_k, __pyx_n_s__20); if (unlikely(!__pyx_tuple__21)) __PYX_ERR(0, 6, __pyx_L1_error) __Pyx_GOTREF(__pyx_tuple__21); __Pyx_GIVEREF(__pyx_tuple__21); /* … */ __pyx_t_7 = __Pyx_PyInt_From_int(((int)0x400)); if (unlikely(!__pyx_t_7)) __PYX_ERR(0, 6, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_7); __pyx_t_4 = PyTuple_New(1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 6, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __Pyx_GIVEREF(__pyx_t_7); if (__Pyx_PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_7)) __PYX_ERR(0, 6, __pyx_L1_error); __pyx_t_7 = 0; __pyx_t_7 = __Pyx_PyDict_NewPresized(2); if (unlikely(!__pyx_t_7)) __PYX_ERR(0, 6, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_7); if (PyDict_SetItem(__pyx_t_7, __pyx_n_s_x, __pyx_kp_s_C_double_1) < 0) __PYX_ERR(0, 6, __pyx_L1_error) if (PyDict_SetItem(__pyx_t_7, __pyx_n_s_steps, __pyx_kp_s_C_int) < 0) __PYX_ERR(0, 6, __pyx_L1_error) __pyx_t_5 = __Pyx_CyFunction_New(&__pyx_mdef_54_cython_magic_5e8733fa9800c350e490ea003224a0ea9bb4060a_1blur_cython_typed, 0, __pyx_n_s_blur_cython_typed, NULL, __pyx_n_s_cython_magic_5e8733fa9800c350e4, __pyx_d, ((PyObject *)__pyx_codeobj__22)); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 6, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_5); __Pyx_CyFunction_SetDefaultsTuple(__pyx_t_5, __pyx_t_4); __Pyx_CyFunction_SetAnnotationsDict(__pyx_t_5, __pyx_t_7); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0; if (PyDict_SetItem(__pyx_d, __pyx_n_s_blur_cython_typed, __pyx_t_5) < 0) __PYX_ERR(0, 6, __pyx_L1_error) __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
07: i: C.size_t
+08: N: C.size_t = x_.shape[0]
__pyx_v_N = (__pyx_v_x_.shape[0]);
+09: x: C.double[::1] = x_.copy()
__pyx_t_1 = __pyx_memoryview_copy_slice_dc_double_c(__pyx_v_x_); if (unlikely(!__pyx_t_1.memview)) __PYX_ERR(0, 9, __pyx_L1_error)
__pyx_v_x = __pyx_t_1;
__pyx_t_1.memview = NULL;
__pyx_t_1.data = NULL;
+10: x_k: C.double[::1] = np.empty_like(x_)
__Pyx_GetModuleGlobalName(__pyx_t_3, __pyx_n_s_np); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __pyx_t_4 = __Pyx_PyObject_GetAttrStr(__pyx_t_3, __pyx_n_s_empty_like); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_3 = __pyx_memoryview_fromslice(__pyx_v_x_, 1, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = NULL; __pyx_t_6 = 0; #if CYTHON_UNPACK_METHODS if (unlikely(PyMethod_Check(__pyx_t_4))) { __pyx_t_5 = PyMethod_GET_SELF(__pyx_t_4); if (likely(__pyx_t_5)) { PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_4); __Pyx_INCREF(__pyx_t_5); __Pyx_INCREF(function); __Pyx_DECREF_SET(__pyx_t_4, function); __pyx_t_6 = 1; } } #endif { PyObject *__pyx_callargs[2] = {__pyx_t_5, __pyx_t_3}; __pyx_t_2 = __Pyx_PyObject_FastCall(__pyx_t_4, __pyx_callargs+1-__pyx_t_6, 1+__pyx_t_6); __Pyx_XDECREF(__pyx_t_5); __pyx_t_5 = 0; __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; } __pyx_t_1 = __Pyx_PyObject_to_MemoryviewSlice_dc_double(__pyx_t_2, PyBUF_WRITABLE); if (unlikely(!__pyx_t_1.memview)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; __pyx_v_x_k = __pyx_t_1; __pyx_t_1.memview = NULL; __pyx_t_1.data = NULL;
+11: x_k[0] = x[0]
__pyx_t_7 = 0; __pyx_t_6 = -1; if (__pyx_t_7 < 0) { __pyx_t_7 += __pyx_v_x.shape[0]; if (unlikely(__pyx_t_7 < 0)) __pyx_t_6 = 0; } else if (unlikely(__pyx_t_7 >= __pyx_v_x.shape[0])) __pyx_t_6 = 0; if (unlikely(__pyx_t_6 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_6); __PYX_ERR(0, 11, __pyx_L1_error) } __pyx_t_8 = 0; __pyx_t_6 = -1; if (__pyx_t_8 < 0) { __pyx_t_8 += __pyx_v_x_k.shape[0]; if (unlikely(__pyx_t_8 < 0)) __pyx_t_6 = 0; } else if (unlikely(__pyx_t_8 >= __pyx_v_x_k.shape[0])) __pyx_t_6 = 0; if (unlikely(__pyx_t_6 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_6); __PYX_ERR(0, 11, __pyx_L1_error) } *((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_x_k.data) + __pyx_t_8)) )) = (*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_x.data) + __pyx_t_7)) )));
+12: x_k[N - 1] = x[N - 1]
__pyx_t_9 = (__pyx_v_N - 1); __pyx_t_6 = -1; if (unlikely(__pyx_t_9 >= (size_t)__pyx_v_x.shape[0])) __pyx_t_6 = 0; if (unlikely(__pyx_t_6 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_6); __PYX_ERR(0, 12, __pyx_L1_error) } __pyx_t_10 = (__pyx_v_N - 1); __pyx_t_6 = -1; if (unlikely(__pyx_t_10 >= (size_t)__pyx_v_x_k.shape[0])) __pyx_t_6 = 0; if (unlikely(__pyx_t_6 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_6); __PYX_ERR(0, 12, __pyx_L1_error) } *((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_x_k.data) + __pyx_t_10)) )) = (*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_x.data) + __pyx_t_9)) )));
+13: for _ in range(steps):
__pyx_t_6 = __pyx_v_steps; __pyx_t_11 = __pyx_t_6; for (__pyx_t_12 = 0; __pyx_t_12 < __pyx_t_11; __pyx_t_12+=1) { __pyx_v__ = __pyx_t_12;
+14: for i in range(1, N - 1):
__pyx_t_9 = (__pyx_v_N - 1); __pyx_t_10 = __pyx_t_9; for (__pyx_t_13 = 1; __pyx_t_13 < __pyx_t_10; __pyx_t_13+=1) { __pyx_v_i = __pyx_t_13;
+15: x_k[i] = 0.25 * (x[i - 1] + 2 * x[i] + x[i + 1])
__pyx_t_14 = (__pyx_v_i - 1); __pyx_t_15 = -1; if (unlikely(__pyx_t_14 >= (size_t)__pyx_v_x.shape[0])) __pyx_t_15 = 0; if (unlikely(__pyx_t_15 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_15); __PYX_ERR(0, 15, __pyx_L1_error) } __pyx_t_16 = __pyx_v_i; __pyx_t_15 = -1; if (unlikely(__pyx_t_16 >= (size_t)__pyx_v_x.shape[0])) __pyx_t_15 = 0; if (unlikely(__pyx_t_15 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_15); __PYX_ERR(0, 15, __pyx_L1_error) } __pyx_t_17 = (__pyx_v_i + 1); __pyx_t_15 = -1; if (unlikely(__pyx_t_17 >= (size_t)__pyx_v_x.shape[0])) __pyx_t_15 = 0; if (unlikely(__pyx_t_15 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_15); __PYX_ERR(0, 15, __pyx_L1_error) } __pyx_t_18 = __pyx_v_i; __pyx_t_15 = -1; if (unlikely(__pyx_t_18 >= (size_t)__pyx_v_x_k.shape[0])) __pyx_t_15 = 0; if (unlikely(__pyx_t_15 != -1)) { __Pyx_RaiseBufferIndexError(__pyx_t_15); __PYX_ERR(0, 15, __pyx_L1_error) } *((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_x_k.data) + __pyx_t_18)) )) = (0.25 * (((*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_x.data) + __pyx_t_14)) ))) + (2.0 * (*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_x.data) + __pyx_t_16)) ))))) + (*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_x.data) + __pyx_t_17)) ))))); }
+16: x, x_k = x_k, x # swap for next step
__pyx_t_1 = __pyx_v_x_k; __PYX_INC_MEMVIEW(&__pyx_t_1, 1); __pyx_t_19 = __pyx_v_x; __PYX_INC_MEMVIEW(&__pyx_t_19, 1); __PYX_XCLEAR_MEMVIEW(&__pyx_v_x, 1); __pyx_v_x = __pyx_t_1; __pyx_t_1.memview = NULL; __pyx_t_1.data = NULL; __PYX_XCLEAR_MEMVIEW(&__pyx_v_x_k, 1); __pyx_v_x_k = __pyx_t_19; __pyx_t_19.memview = NULL; __pyx_t_19.data = NULL; }
+17: return np.asarray(x)
__Pyx_XDECREF(__pyx_r); __Pyx_GetModuleGlobalName(__pyx_t_4, __pyx_n_s_np); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_4, __pyx_n_s_asarray); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __pyx_t_4 = __pyx_memoryview_fromslice(__pyx_v_x, 1, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __pyx_t_5 = NULL; __pyx_t_6 = 0; #if CYTHON_UNPACK_METHODS if (unlikely(PyMethod_Check(__pyx_t_3))) { __pyx_t_5 = PyMethod_GET_SELF(__pyx_t_3); if (likely(__pyx_t_5)) { PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3); __Pyx_INCREF(__pyx_t_5); __Pyx_INCREF(function); __Pyx_DECREF_SET(__pyx_t_3, function); __pyx_t_6 = 1; } } #endif { PyObject *__pyx_callargs[2] = {__pyx_t_5, __pyx_t_4}; __pyx_t_2 = __Pyx_PyObject_FastCall(__pyx_t_3, __pyx_callargs+1-__pyx_t_6, 1+__pyx_t_6); __Pyx_XDECREF(__pyx_t_5); __pyx_t_5 = 0; __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; } __pyx_r = __pyx_t_2; __pyx_t_2 = 0; goto __pyx_L0;
ct = %timeit -o y = blur_cython_typed(x, steps)
t_ct = ct.best
times["cython (types)"] = t_ct
plot_times()
1.92 ms ± 21.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
We can further optimize with Cython macros, which disable bounds checking and negative indexing, and avoiding the Python variable swaping by using indices into a single array:
x[len(x)]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[41], line 1
----> 1 x[len(x)]
IndexError: index 1024 is out of bounds for axis 0 with size 1024
%%cython --annotate
import cython as C
import numpy as np
@C.boundscheck(False)
@C.wraparound(False)
def blur_cython_optimized(x_: C.double[::1], steps: C.int = 1024) -> C.double[::1]:
i: C.size_t
N: C.size_t = x_.shape[0]
x: C.double[::1] = x_.copy()
x_k: C.double[::1] = np.empty_like(x_)
x_k[0] = x[0]
x_k[N - 1] = x[N - 1]
for _ in range(steps):
for i in range(1, N - 1):
x_k[i] = 0.25 * (x[i - 1] + 2 * x[i] + x[i + 1])
x, x_k = x_k, x # swap for next step
return np.asarray(x)
Generated by Cython 3.0.2
Yellow lines hint at Python interaction.
Click on a line that starts with a "+
" to see the C code that Cython generated for it.
01:
+02: import cython as C
__pyx_t_5 = __Pyx_PyDict_NewPresized(0); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_5); if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_5) < 0) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
+03: import numpy as np
__pyx_t_7 = __Pyx_ImportDottedModule(__pyx_n_s_numpy, NULL); if (unlikely(!__pyx_t_7)) __PYX_ERR(0, 3, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_7); if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_7) < 0) __PYX_ERR(0, 3, __pyx_L1_error) __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0;
04:
05:
+06: @C.boundscheck(False)
/* Python wrapper */ static PyObject *__pyx_pw_54_cython_magic_f838c940fd52286287b522261b5b14c55c752853_1blur_cython_optimized(PyObject *__pyx_self, #if CYTHON_METH_FASTCALL PyObject *const *__pyx_args, Py_ssize_t __pyx_nargs, PyObject *__pyx_kwds #else PyObject *__pyx_args, PyObject *__pyx_kwds #endif ); /*proto*/ static PyMethodDef __pyx_mdef_54_cython_magic_f838c940fd52286287b522261b5b14c55c752853_1blur_cython_optimized = {"blur_cython_optimized", (PyCFunction)(void*)(__Pyx_PyCFunction_FastCallWithKeywords)__pyx_pw_54_cython_magic_f838c940fd52286287b522261b5b14c55c752853_1blur_cython_optimized, __Pyx_METH_FASTCALL|METH_KEYWORDS, 0}; static PyObject *__pyx_pw_54_cython_magic_f838c940fd52286287b522261b5b14c55c752853_1blur_cython_optimized(PyObject *__pyx_self, #if CYTHON_METH_FASTCALL PyObject *const *__pyx_args, Py_ssize_t __pyx_nargs, PyObject *__pyx_kwds #else PyObject *__pyx_args, PyObject *__pyx_kwds #endif ) { __Pyx_memviewslice __pyx_v_x_ = { 0, 0, { 0 }, { 0 }, { 0 } }; int __pyx_v_steps; #if !CYTHON_METH_FASTCALL CYTHON_UNUSED Py_ssize_t __pyx_nargs; #endif CYTHON_UNUSED PyObject *const *__pyx_kwvalues; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("blur_cython_optimized (wrapper)", 0); #if !CYTHON_METH_FASTCALL #if CYTHON_ASSUME_SAFE_MACROS __pyx_nargs = PyTuple_GET_SIZE(__pyx_args); #else __pyx_nargs = PyTuple_Size(__pyx_args); if (unlikely((__pyx_nargs < 0))) __PYX_ERR(0, 6, __pyx_L3_error) #endif #endif __pyx_kwvalues = __Pyx_KwValues_FASTCALL(__pyx_args, __pyx_nargs); { PyObject **__pyx_pyargnames[] = {&__pyx_n_s_x,&__pyx_n_s_steps,0}; PyObject* values[2] = {0,0}; if (__pyx_kwds) { Py_ssize_t kw_args; switch (__pyx_nargs) { case 2: values[1] = __Pyx_Arg_FASTCALL(__pyx_args, 1); CYTHON_FALLTHROUGH; case 1: values[0] = __Pyx_Arg_FASTCALL(__pyx_args, 0); CYTHON_FALLTHROUGH; case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = __Pyx_NumKwargs_FASTCALL(__pyx_kwds); switch (__pyx_nargs) { case 0: if (likely((values[0] = __Pyx_GetKwValue_FASTCALL(__pyx_kwds, __pyx_kwvalues, __pyx_n_s_x)) != 0)) { (void)__Pyx_Arg_NewRef_FASTCALL(values[0]); kw_args--; } else if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 6, __pyx_L3_error) else goto __pyx_L5_argtuple_error; CYTHON_FALLTHROUGH; case 1: if (kw_args > 0) { PyObject* value = __Pyx_GetKwValue_FASTCALL(__pyx_kwds, __pyx_kwvalues, __pyx_n_s_steps); if (value) { values[1] = __Pyx_Arg_NewRef_FASTCALL(value); kw_args--; } else if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 6, __pyx_L3_error) } } if (unlikely(kw_args > 0)) { const Py_ssize_t kwd_pos_args = __pyx_nargs; if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_kwvalues, __pyx_pyargnames, 0, values + 0, kwd_pos_args, "blur_cython_optimized") < 0)) __PYX_ERR(0, 6, __pyx_L3_error) } } else { switch (__pyx_nargs) { case 2: values[1] = __Pyx_Arg_FASTCALL(__pyx_args, 1); CYTHON_FALLTHROUGH; case 1: values[0] = __Pyx_Arg_FASTCALL(__pyx_args, 0); break; default: goto __pyx_L5_argtuple_error; } } __pyx_v_x_ = __Pyx_PyObject_to_MemoryviewSlice_dc_double(values[0], PyBUF_WRITABLE); if (unlikely(!__pyx_v_x_.memview)) __PYX_ERR(0, 8, __pyx_L3_error) if (values[1]) { __pyx_v_steps = __Pyx_PyInt_As_int(values[1]); if (unlikely((__pyx_v_steps == (int)-1) && PyErr_Occurred())) __PYX_ERR(0, 8, __pyx_L3_error) } else { __pyx_v_steps = ((int)((int)0x400)); } } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("blur_cython_optimized", 0, 1, 2, __pyx_nargs); __PYX_ERR(0, 6, __pyx_L3_error) goto __pyx_L3_error; __pyx_L3_error:; { Py_ssize_t __pyx_temp; for (__pyx_temp=0; __pyx_temp < (Py_ssize_t)(sizeof(values)/sizeof(values[0])); ++__pyx_temp) { __Pyx_Arg_XDECREF_FASTCALL(values[__pyx_temp]); } } __PYX_XCLEAR_MEMVIEW(&__pyx_v_x_, 1); __Pyx_AddTraceback("_cython_magic_f838c940fd52286287b522261b5b14c55c752853.blur_cython_optimized", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; if (unlikely(((PyObject *)__pyx_v_x_.memview) == Py_None)) { PyErr_Format(PyExc_TypeError, "Argument '%.200s' must not be None", "x_"); __PYX_ERR(0, 8, __pyx_L1_error) } __pyx_r = __pyx_pf_54_cython_magic_f838c940fd52286287b522261b5b14c55c752853_blur_cython_optimized(__pyx_self, __pyx_v_x_, __pyx_v_steps); int __pyx_lineno = 0; const char *__pyx_filename = NULL; int __pyx_clineno = 0; /* function exit code */ goto __pyx_L0; __pyx_L1_error:; __pyx_r = NULL; __pyx_L0:; __PYX_XCLEAR_MEMVIEW(&__pyx_v_x_, 1); { Py_ssize_t __pyx_temp; for (__pyx_temp=0; __pyx_temp < (Py_ssize_t)(sizeof(values)/sizeof(values[0])); ++__pyx_temp) { __Pyx_Arg_XDECREF_FASTCALL(values[__pyx_temp]); } } __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyObject *__pyx_pf_54_cython_magic_f838c940fd52286287b522261b5b14c55c752853_blur_cython_optimized(CYTHON_UNUSED PyObject *__pyx_self, __Pyx_memviewslice __pyx_v_x_, int __pyx_v_steps) { size_t __pyx_v_i; size_t __pyx_v_N; __Pyx_memviewslice __pyx_v_x = { 0, 0, { 0 }, { 0 }, { 0 } }; __Pyx_memviewslice __pyx_v_x_k = { 0, 0, { 0 }, { 0 }, { 0 } }; CYTHON_UNUSED int __pyx_v__; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("blur_cython_optimized", 0); /* … */ /* function exit code */ __pyx_L1_error:; __PYX_XCLEAR_MEMVIEW(&__pyx_t_1, 1); __Pyx_XDECREF(__pyx_t_2); __Pyx_XDECREF(__pyx_t_3); __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_5); __PYX_XCLEAR_MEMVIEW(&__pyx_t_18, 1); __Pyx_AddTraceback("_cython_magic_f838c940fd52286287b522261b5b14c55c752853.blur_cython_optimized", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __PYX_XCLEAR_MEMVIEW(&__pyx_v_x, 1); __PYX_XCLEAR_MEMVIEW(&__pyx_v_x_k, 1); __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* … */ __pyx_tuple__21 = PyTuple_Pack(7, __pyx_n_s_x, __pyx_n_s_steps, __pyx_n_s_i, __pyx_n_s_N, __pyx_n_s_x_2, __pyx_n_s_x_k, __pyx_n_s__20); if (unlikely(!__pyx_tuple__21)) __PYX_ERR(0, 6, __pyx_L1_error) __Pyx_GOTREF(__pyx_tuple__21); __Pyx_GIVEREF(__pyx_tuple__21); /* … */ __pyx_t_4 = PyTuple_New(1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 6, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __Pyx_GIVEREF(__pyx_t_7); if (__Pyx_PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_7)) __PYX_ERR(0, 6, __pyx_L1_error); __pyx_t_7 = 0; __pyx_t_7 = __Pyx_PyDict_NewPresized(3); if (unlikely(!__pyx_t_7)) __PYX_ERR(0, 6, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_7); if (PyDict_SetItem(__pyx_t_7, __pyx_n_s_x, __pyx_kp_s_C_double_1) < 0) __PYX_ERR(0, 6, __pyx_L1_error) if (PyDict_SetItem(__pyx_t_7, __pyx_n_s_steps, __pyx_kp_s_C_int) < 0) __PYX_ERR(0, 6, __pyx_L1_error) if (PyDict_SetItem(__pyx_t_7, __pyx_n_s_return, __pyx_kp_s_C_double_1) < 0) __PYX_ERR(0, 6, __pyx_L1_error) __pyx_t_5 = __Pyx_CyFunction_New(&__pyx_mdef_54_cython_magic_f838c940fd52286287b522261b5b14c55c752853_1blur_cython_optimized, 0, __pyx_n_s_blur_cython_optimized, NULL, __pyx_n_s_cython_magic_f838c940fd52286287, __pyx_d, ((PyObject *)__pyx_codeobj__22)); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 6, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_5); __Pyx_CyFunction_SetDefaultsTuple(__pyx_t_5, __pyx_t_4); __Pyx_CyFunction_SetAnnotationsDict(__pyx_t_5, __pyx_t_7); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0; if (PyDict_SetItem(__pyx_d, __pyx_n_s_blur_cython_optimized, __pyx_t_5) < 0) __PYX_ERR(0, 6, __pyx_L1_error) __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
07: @C.wraparound(False)
+08: def blur_cython_optimized(x_: C.double[::1], steps: C.int = 1024) -> C.double[::1]:
__pyx_t_7 = __Pyx_PyInt_From_int(((int)0x400)); if (unlikely(!__pyx_t_7)) __PYX_ERR(0, 8, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_7);
09: i: C.size_t
+10: N: C.size_t = x_.shape[0]
__pyx_v_N = (__pyx_v_x_.shape[0]);
+11: x: C.double[::1] = x_.copy()
__pyx_t_1 = __pyx_memoryview_copy_slice_dc_double_c(__pyx_v_x_); if (unlikely(!__pyx_t_1.memview)) __PYX_ERR(0, 11, __pyx_L1_error)
__pyx_v_x = __pyx_t_1;
__pyx_t_1.memview = NULL;
__pyx_t_1.data = NULL;
+12: x_k: C.double[::1] = np.empty_like(x_)
__Pyx_GetModuleGlobalName(__pyx_t_3, __pyx_n_s_np); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 12, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __pyx_t_4 = __Pyx_PyObject_GetAttrStr(__pyx_t_3, __pyx_n_s_empty_like); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 12, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_3 = __pyx_memoryview_fromslice(__pyx_v_x_, 1, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 12, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __pyx_t_5 = NULL; __pyx_t_6 = 0; #if CYTHON_UNPACK_METHODS if (unlikely(PyMethod_Check(__pyx_t_4))) { __pyx_t_5 = PyMethod_GET_SELF(__pyx_t_4); if (likely(__pyx_t_5)) { PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_4); __Pyx_INCREF(__pyx_t_5); __Pyx_INCREF(function); __Pyx_DECREF_SET(__pyx_t_4, function); __pyx_t_6 = 1; } } #endif { PyObject *__pyx_callargs[2] = {__pyx_t_5, __pyx_t_3}; __pyx_t_2 = __Pyx_PyObject_FastCall(__pyx_t_4, __pyx_callargs+1-__pyx_t_6, 1+__pyx_t_6); __Pyx_XDECREF(__pyx_t_5); __pyx_t_5 = 0; __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 12, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; } __pyx_t_1 = __Pyx_PyObject_to_MemoryviewSlice_dc_double(__pyx_t_2, PyBUF_WRITABLE); if (unlikely(!__pyx_t_1.memview)) __PYX_ERR(0, 12, __pyx_L1_error) __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; __pyx_v_x_k = __pyx_t_1; __pyx_t_1.memview = NULL; __pyx_t_1.data = NULL;
+13: x_k[0] = x[0]
__pyx_t_7 = 0; __pyx_t_8 = 0; *((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_x_k.data) + __pyx_t_8)) )) = (*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_x.data) + __pyx_t_7)) )));
+14: x_k[N - 1] = x[N - 1]
__pyx_t_9 = (__pyx_v_N - 1); __pyx_t_10 = (__pyx_v_N - 1); *((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_x_k.data) + __pyx_t_10)) )) = (*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_x.data) + __pyx_t_9)) )));
+15: for _ in range(steps):
__pyx_t_6 = __pyx_v_steps; __pyx_t_11 = __pyx_t_6; for (__pyx_t_12 = 0; __pyx_t_12 < __pyx_t_11; __pyx_t_12+=1) { __pyx_v__ = __pyx_t_12;
+16: for i in range(1, N - 1):
__pyx_t_9 = (__pyx_v_N - 1); __pyx_t_10 = __pyx_t_9; for (__pyx_t_13 = 1; __pyx_t_13 < __pyx_t_10; __pyx_t_13+=1) { __pyx_v_i = __pyx_t_13;
+17: x_k[i] = 0.25 * (x[i - 1] + 2 * x[i] + x[i + 1])
__pyx_t_14 = (__pyx_v_i - 1); __pyx_t_15 = __pyx_v_i; __pyx_t_16 = (__pyx_v_i + 1); __pyx_t_17 = __pyx_v_i; *((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_x_k.data) + __pyx_t_17)) )) = (0.25 * (((*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_x.data) + __pyx_t_14)) ))) + (2.0 * (*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_x.data) + __pyx_t_15)) ))))) + (*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_v_x.data) + __pyx_t_16)) ))))); }
+18: x, x_k = x_k, x # swap for next step
__pyx_t_1 = __pyx_v_x_k; __PYX_INC_MEMVIEW(&__pyx_t_1, 1); __pyx_t_18 = __pyx_v_x; __PYX_INC_MEMVIEW(&__pyx_t_18, 1); __PYX_XCLEAR_MEMVIEW(&__pyx_v_x, 1); __pyx_v_x = __pyx_t_1; __pyx_t_1.memview = NULL; __pyx_t_1.data = NULL; __PYX_XCLEAR_MEMVIEW(&__pyx_v_x_k, 1); __pyx_v_x_k = __pyx_t_18; __pyx_t_18.memview = NULL; __pyx_t_18.data = NULL; }
+19: return np.asarray(x)
__Pyx_XDECREF(__pyx_r); __Pyx_GetModuleGlobalName(__pyx_t_4, __pyx_n_s_np); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 19, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_4, __pyx_n_s_asarray); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 19, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __pyx_t_4 = __pyx_memoryview_fromslice(__pyx_v_x, 1, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 19, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __pyx_t_5 = NULL; __pyx_t_6 = 0; #if CYTHON_UNPACK_METHODS if (unlikely(PyMethod_Check(__pyx_t_3))) { __pyx_t_5 = PyMethod_GET_SELF(__pyx_t_3); if (likely(__pyx_t_5)) { PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3); __Pyx_INCREF(__pyx_t_5); __Pyx_INCREF(function); __Pyx_DECREF_SET(__pyx_t_3, function); __pyx_t_6 = 1; } } #endif { PyObject *__pyx_callargs[2] = {__pyx_t_5, __pyx_t_4}; __pyx_t_2 = __Pyx_PyObject_FastCall(__pyx_t_3, __pyx_callargs+1-__pyx_t_6, 1+__pyx_t_6); __Pyx_XDECREF(__pyx_t_5); __pyx_t_5 = 0; __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 19, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; } __pyx_r = __pyx_t_2; __pyx_t_2 = 0; goto __pyx_L0;
Note how there is now zero yellow called in any of the loops, only in the initial copy of the input array.
copt = %timeit -o y = blur_cython_optimized(x, steps)
t_copt = copt.best
times["cython (optimized)"] = t_copt
plot_times()
1.84 ms ± 22.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
y = blur_cython_optimized(x, steps) # noqa: F821
plt.plot(t, x, "--")
plt.plot(t, y)
[<matplotlib.lines.Line2D at 0x16bb5c890>]
numba#
numba is a library that attempts to automatically do type-based optimizations like we did with Cython.
To use numba, you decorate functions with @jit
.
import numba
@numba.jit(nopython=True)
def blur_numba(x, steps=1024):
"""identical to blur_py, other than the decorator"""
x = 1 * x # copy
x_k = np.empty_like(x)
x_k[0] = x[0]
x_k[-1] = x[-1]
for _ in range(steps):
for i in range(1, len(x) - 1):
x_k[i] = 0.25 * (x[i - 1] + 2 * x[i] + x[i + 1])
x, x_k = x_k, x # swap for next step
return x
y = blur_numba(x, steps)
nb = %timeit -o blur_numba(x, steps)
t_nb = nb.best
times["numba"] = t_nb
plot_times()
2.01 ms ± 2.48 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
timing_table(times)
implementation | speed |
---|---|
python | 1.0 (normalized) |
numpy | 89.2x |
cython (no hints) | 1.05x |
cython (loops) | 3.38x |
cython (types) | 1,560x |
cython (optimized) | 1,634x |
numba | 1,472x |
How did numba do that without any type info?
blur_numba.inspect_types()
blur_numba (Array(float64, 1, 'C', False, aligned=True), int64)
--------------------------------------------------------------------------------
# File: /var/folders/qr/3vxfnp1x2t1fw55dr288mphc0000gn/T/ipykernel_35659/4028818462.py
# --- LINE 4 ---
# label 0
# x = arg(0, name=x) :: array(float64, 1d, C)
# steps = arg(1, name=steps) :: int64
@numba.jit(nopython=True)
# --- LINE 5 ---
def blur_numba(x, steps=1024):
# --- LINE 6 ---
"""identical to blur_py, other than the decorator"""
# --- LINE 7 ---
# $const4.0 = const(int, 1) :: Literal[int](1)
# del $const4.0
# x.1 = arrayexpr(expr=(<built-in function mul>, [const(int, 1), Var(x, 4028818462.py:4)]), ty=array(float64, 1d, C)) :: array(float64, 1d, C)
# del x
# x.1.2 = x.1 :: array(float64, 1d, C)
x = 1 * x # copy
# --- LINE 8 ---
# $14load_global.3 = global(np: <module 'numpy' from '/Users/minrk/conda/lib/python3.11/site-packages/numpy/__init__.py'>) :: Module(<module 'numpy' from '/Users/minrk/conda/lib/python3.11/site-packages/numpy/__init__.py'>)
# $26load_method.5 = getattr(value=$14load_global.3, attr=empty_like) :: Function(<function empty_like at 0x10602fd80>)
# del $14load_global.3
# x_k = call $26load_method.5(x.1, func=$26load_method.5, args=[Var(x.1, 4028818462.py:7)], kws=(), vararg=None, varkwarg=None, target=None) :: (Array(float64, 1, 'C', False, aligned=True), omitted(default=None)) -> array(float64, 1d, C)
# del $26load_method.5
# x_k.2 = x_k :: array(float64, 1d, C)
x_k = np.empty_like(x)
# --- LINE 9 ---
# $const68.9 = const(int, 0) :: Literal[int](0)
# $70binary_subscr.10 = static_getitem(value=x.1, index=0, index_var=$const68.9, fn=<built-in function getitem>) :: float64
# del $const68.9
# $const82.12 = const(int, 0) :: Literal[int](0)
# x_k[0] = $70binary_subscr.10
# del $const82.12
# del $70binary_subscr.10
x_k[0] = x[0]
# --- LINE 10 ---
# $const90.14 = const(int, -1) :: Literal[int](-1)
# $92binary_subscr.15 = static_getitem(value=x.1, index=-1, index_var=$const90.14, fn=<built-in function getitem>) :: float64
# del x.1
# del $const90.14
# $const104.17 = const(int, -1) :: Literal[int](-1)
# x_k[-1] = $92binary_subscr.15
# del x_k
# del $const104.17
# del $92binary_subscr.15
x_k[-1] = x[-1]
# --- LINE 11 ---
# $110load_global.18 = global(range: <class 'range'>) :: Function(<class 'range'>)
# $128call.21 = call $110load_global.18(steps, func=$110load_global.18, args=[Var(steps, 4028818462.py:4)], kws=(), vararg=None, varkwarg=None, target=None) :: (int64,) -> range_state_int64
# del steps
# del $110load_global.18
# $138get_iter.22 = getiter(value=$128call.21) :: range_iter_int64
# del $128call.21
# $phi140.0 = $138get_iter.22 :: range_iter_int64
# del $138get_iter.22
# jump 140
# label 140
# $140for_iter.1 = iternext(value=$phi140.0) :: pair<int64, bool>
# $140for_iter.2 = pair_first(value=$140for_iter.1) :: int64
# $140for_iter.3 = pair_second(value=$140for_iter.1) :: bool
# del $140for_iter.1
# $phi142.1 = $140for_iter.2 :: int64
# del $140for_iter.2
# branch $140for_iter.3, 142, 306
# label 142
# del $140for_iter.3
# _ = $phi142.1 :: int64
# del _
# del $phi142.1
for _ in range(steps):
# --- LINE 12 ---
# $144load_global.2 = global(range: <class 'range'>) :: Function(<class 'range'>)
# $const156.4 = const(int, 1) :: Literal[int](1)
# $158load_global.5 = global(len: <built-in function len>) :: Function(<built-in function len>)
# $176call.8 = call $158load_global.5(x.1.2, func=$158load_global.5, args=[Var(x.1.2, 4028818462.py:11)], kws=(), vararg=None, varkwarg=None, target=None) :: (Array(float64, 1, 'C', False, aligned=True),) -> int64
# del $158load_global.5
# $const186.9 = const(int, 1) :: Literal[int](1)
# $binop_sub188.10 = $176call.8 - $const186.9 :: int64
# del $const186.9
# del $176call.8
# $196call.11 = call $144load_global.2($const156.4, $binop_sub188.10, func=$144load_global.2, args=[Var($const156.4, 4028818462.py:12), Var($binop_sub188.10, 4028818462.py:12)], kws=(), vararg=None, varkwarg=None, target=None) :: (int64, int64) -> range_state_int64
# del $const156.4
# del $binop_sub188.10
# del $144load_global.2
# $206get_iter.12 = getiter(value=$196call.11) :: range_iter_int64
# del $196call.11
# $phi208.1 = $206get_iter.12 :: range_iter_int64
# del $206get_iter.12
# jump 208
# label 208
# $208for_iter.2 = iternext(value=$phi208.1) :: pair<int64, bool>
# $208for_iter.3 = pair_first(value=$208for_iter.2) :: int64
# $208for_iter.4 = pair_second(value=$208for_iter.2) :: bool
# del $208for_iter.2
# $phi210.2 = $208for_iter.3 :: int64
# del $208for_iter.3
# branch $208for_iter.4, 210, 296
# label 210
# del $208for_iter.4
# i = $phi210.2 :: int64
# del $phi210.2
for i in range(1, len(x) - 1):
# --- LINE 13 ---
# $const212.3 = const(float, 0.25) :: float64
# $const218.6 = const(int, 1) :: Literal[int](1)
# $binop_sub220.7 = i - $const218.6 :: int64
# del $const218.6
# $224binary_subscr.8 = getitem(value=x.1.2, index=$binop_sub220.7, fn=<built-in function getitem>) :: float64
# del $binop_sub220.7
# $const234.9 = const(int, 2) :: Literal[int](2)
# $240binary_subscr.12 = getitem(value=x.1.2, index=i, fn=<built-in function getitem>) :: float64
# $binop_mul250.13 = $const234.9 * $240binary_subscr.12 :: float64
# del $const234.9
# del $240binary_subscr.12
# $binop_add254.14 = $224binary_subscr.8 + $binop_mul250.13 :: float64
# del $binop_mul250.13
# del $224binary_subscr.8
# $const262.17 = const(int, 1) :: Literal[int](1)
# $binop_add264.18 = i + $const262.17 :: int64
# del $const262.17
# $268binary_subscr.19 = getitem(value=x.1.2, index=$binop_add264.18, fn=<built-in function getitem>) :: float64
# del $binop_add264.18
# $binop_add278.20 = $binop_add254.14 + $268binary_subscr.19 :: float64
# del $binop_add254.14
# del $268binary_subscr.19
# $binop_mul282.21 = $const212.3 * $binop_add278.20 :: float64
# del $const212.3
# del $binop_add278.20
# x_k.2[i] = $binop_mul282.21 :: (Array(float64, 1, 'C', False, aligned=True), int64, float64) -> none
# del i
# del $binop_mul282.21
# jump 208
x_k[i] = 0.25 * (x[i - 1] + 2 * x[i] + x[i + 1])
# --- LINE 14 ---
# label 296
# del $phi210.2
# del $phi208.1
# del $208for_iter.4
# $x_k296.1 = x_k.2 :: array(float64, 1d, C)
# x_k.1 = x.1.2 :: array(float64, 1d, C)
# x_k.2 = x_k.1 :: array(float64, 1d, C)
# del x_k.1
# x.1.1 = $x_k296.1 :: array(float64, 1d, C)
# del $x_k296.1
# x.1.2 = x.1.1 :: array(float64, 1d, C)
# del x.1.1
# jump 140
x, x_k = x_k, x # swap for next step
# --- LINE 15 ---
# label 306
# del x_k.2
# del $phi142.1
# del $phi140.0
# del $140for_iter.3
# $308return_value.1 = cast(value=x.1.2) :: array(float64, 1d, C)
# del x.1.2
# return $308return_value.1
return x
================================================================================
What’s impressive about numba in this case is that it is able to beat all but the most optimized of our implementations without any help. Like Cython, numba can do an even better job when you provide it with more information about how a function will be called.
Profiling#
A script can be run with
python3 -m cProfile myscript.py
import sys
%prun list(os.walk(sys.prefix))
3244810 function calls (2789533 primitive calls) in 6.431 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
741078 2.552 0.000 2.552 0.000 {built-in method builtins.next}
56634 1.431 0.000 1.431 0.000 {built-in method posix.lstat}
684494 1.138 0.000 1.138 0.000 {method 'is_dir' of 'posix.DirEntry' objects}
511862/56585 0.607 0.000 6.418 0.000 <frozen os>:345(_walk)
56584 0.532 0.000 0.532 0.000 {built-in method posix.scandir}
56634 0.063 0.000 0.095 0.000 <frozen posixpath>:71(join)
684494 0.032 0.000 0.032 0.000 {method 'append' of 'list' objects}
56634 0.022 0.000 1.456 0.000 <frozen posixpath>:164(islink)
1 0.013 0.013 6.431 6.431 <string>:1(<module>)
56634 0.012 0.000 0.016 0.000 <frozen posixpath>:41(_get_sep)
56634 0.007 0.000 0.007 0.000 {method 'startswith' of 'str' objects}
56634 0.006 0.000 0.006 0.000 {method 'endswith' of 'str' objects}
56584 0.005 0.000 0.005 0.000 {method '__exit__' of 'posix.ScandirIterator' objects}
56634 0.004 0.000 0.004 0.000 {built-in method builtins.isinstance}
56634 0.004 0.000 0.004 0.000 {built-in method _stat.S_ISLNK}
56635 0.003 0.000 0.003 0.000 {built-in method posix.fspath}
1 0.000 0.000 6.431 6.431 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 <frozen os>:282(walk)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2 0.000 0.000 0.000 0.000 cycler.py:239(__iter__)
1 0.000 0.000 0.000 0.000 {built-in method sys.audit}
Snakeviz instaleld with:
python3 -m pip install snakeviz
%load_ext snakeviz
%snakeviz -t blur_py(x, steps)
*** Profile stats marshalled to file '/var/folders/qr/3vxfnp1x2t1fw55dr288mphc0000gn/T/tmpzxarik_z'.
Opening SnakeViz in a new tab...
import hashlib
%%snakeviz -t
for dirpath, dirnames, filenames in os.walk('/usr/local'):
for filename in filenames:
if filename.endswith('.txt'):
full_path = os.path.join(dirpath, filename)
with open(full_path, 'rb') as f:
hashlib.md5(f.read())
*** Profile stats marshalled to file '/var/folders/qr/3vxfnp1x2t1fw55dr288mphc0000gn/T/tmpf9tbru6q'.
Opening SnakeViz in a new tab...
line_profiler installed with:
python3 -m pip install line_profiler
%load_ext line_profiler
%lprun -f blur_py blur_py(x, steps)
Timer unit: 1e-09 s
Total time: 5.55848 s
File: /var/folders/qr/3vxfnp1x2t1fw55dr288mphc0000gn/T/ipykernel_35659/2340594174.py
Function: blur_py at line 1
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1 def blur_py(x0, steps=1024):
2 """Advance a Gaussian blur `steps` steps"""
3 1 101000.0 101000.0 0.0 x = 1 * x0 # copy
4 1 32000.0 32000.0 0.0 x_k = np.empty_like(x)
5 1 7000.0 7000.0 0.0 x_k[0] = x[0]
6 1 1000.0 1000.0 0.0 x_k[-1] = x[-1]
7 10001 803000.0 80.3 0.0 for k in range(steps):
8 10230000 793656000.0 77.6 14.3 for i in range(1, len(x) - 1):
9 10220000 4762891000.0 466.0 85.7 x_k[i] = 0.25 * (x[i - 1] + 2 * x[i] + x[i + 1])
10 10000 988000.0 98.8 0.0 x, x_k = x_k, x # swap for next step
11 1 0.0 0.0 0.0 return x
%lprun -f blur_np blur_np(x, steps)
Timer unit: 1e-09 s
Total time: 0.037441 s
File: /var/folders/qr/3vxfnp1x2t1fw55dr288mphc0000gn/T/ipykernel_35659/109327557.py
Function: blur_np at line 4
Line # Hits Time Per Hit % Time Line Contents
==============================================================
4 def blur_np(x, steps=1024):
5 1 21000.0 21000.0 0.1 x = 1 * x
6 1 5000.0 5000.0 0.0 x_k = np.empty_like(x)
7 1 1000.0 1000.0 0.0 x_k[0] = x[0]
8 1 0.0 0.0 0.0 x_k[-1] = x[-1]
9 10001 796000.0 79.6 2.1 for _ in range(steps):
10 10000 35757000.0 3575.7 95.5 x_k[1:-1] = 0.25 * (x[0:-2] + 2 * x[1:-1] + x[2:])
11 10000 861000.0 86.1 2.3 x, x_k = x_k, x
12 1 0.0 0.0 0.0 return x
%lprun -f blur_numba -f blur_numba.__wrapped__ blur_numba(x, steps)
/Users/minrk/conda/lib/python3.11/site-packages/line_profiler/ipython_extension.py:97: UserWarning: Adding a function with a __wrapped__ attribute. You may want to profile the wrapped function by adding blur_numba.__wrapped__ instead.
profile = LineProfiler(*funcs)
Timer unit: 1e-09 s
Total time: 0 s
File: /var/folders/qr/3vxfnp1x2t1fw55dr288mphc0000gn/T/ipykernel_35659/4028818462.py
Function: blur_numba at line 4
Line # Hits Time Per Hit % Time Line Contents
==============================================================
4 @numba.jit(nopython=True)
5 def blur_numba(x, steps=1024):
6 """identical to blur_py, other than the decorator"""
7 x = 1 * x # copy
8 x_k = np.empty_like(x)
9 x_k[0] = x[0]
10 x_k[-1] = x[-1]
11 for _ in range(steps):
12 for i in range(1, len(x) - 1):
13 x_k[i] = 0.25 * (x[i - 1] + 2 * x[i] + x[i + 1])
14 x, x_k = x_k, x # swap for next step
15 return x
%lprun -f blur_cython blur_cython(x, steps)
Timer unit: 1e-09 s
%snakeviz -t np.dot(x, x)
*** Profile stats marshalled to file '/var/folders/qr/3vxfnp1x2t1fw55dr288mphc0000gn/T/tmpykxzyuqd'.
Opening SnakeViz in a new tab...