10.02.2018       Выпуск 216 (05.02.2018 - 11.02.2018)       Статьи

Как Numba и Cython оптимизируют Python код Numba cython

Короткая статья с примерами оптимизаций

Читать>>



Экспериментальная функция:

Ниже вы видите текст статьи по ссылке. По нему можно быстро понять ссылка достойна прочтения или нет

Просим обратить внимание, что текст по ссылке и здесь может не совпадать.

Over the past years, Numba and Cython have gained a lot of attention in the data science community. They both provide a way to speed up CPU intensive tasks, but in different ways. This article describes architectural differences between them.

Numba

Numba is a just-in-time (JIT) compiler which translates Python code to native machine instructions both for CPU and GPU. The code can be compiled at import time, runtime, or ahead of time.

It's extremely easy to start using Numba, by simply putting a jit decorator:

import numpy as np
from numba import jit

a = np.arange(1, 10 ** 7)
b = np.arange(-10 ** 7, -1)


@jit(nopython=True)
def sum_sequence(a, b):
    result = np.zeros_like(a)
    for i in range(len(a)):
        result[i] = a[i] - b[i]
    return result
>>> fast_sum_sequence = jit(int64[:](int64[:], int64[:]), nopython=True)(sum_sequence)
>>> timeit.timeit('sum_sequence(a, b)', globals=globals(), number=1)
Basic Python version: 4.227093786990736
>>> timeit.timeit('fast_sum_sequence(a, b)', globals=globals(), number=1)
Numba version: 0.05048697197344154

As you may know, In Python, all code blocks are compiled down to bytecode:

>>> import dis
>>> dis.dis(sum_sequence)
  2           0 LOAD_GLOBAL              0 (np)
              2 LOAD_ATTR                1 (zeros_like)
              4 LOAD_FAST                0 (a)
              6 CALL_FUNCTION            1
              8 STORE_FAST               2 (result)

  3          10 SETUP_LOOP              40 (to 52)
             12 LOAD_GLOBAL              2 (range)
             14 LOAD_GLOBAL              3 (len)
             16 LOAD_FAST                0 (a)
             18 CALL_FUNCTION            1
             20 CALL_FUNCTION            1
             22 GET_ITER
        >>   24 FOR_ITER                24 (to 50)
             26 STORE_FAST               3 (i)

  4          28 LOAD_FAST                0 (a)
             30 LOAD_FAST                3 (i)
             32 BINARY_SUBSCR
             34 LOAD_FAST                1 (b)
             36 LOAD_FAST                3 (i)
             38 BINARY_SUBSCR
             40 BINARY_SUBTRACT
             42 LOAD_FAST                2 (result)
             44 LOAD_FAST                3 (i)
             46 STORE_SUBSCR
             48 JUMP_ABSOLUTE           24
        >>   50 POP_BLOCK

  5     >>   52 LOAD_FAST                2 (result)
             54 RETURN_VALUE

Code optimization

To optimize Python code, Numba takes a bytecode from a provided function and runs a set of analyzers on it. This process involves many stages, but as a result, Numba converts Python bytecode to LLVM intermediate representation (IR).

Note that LLVM IR is a low-level programming language, which is similar to assembler syntax and has nothing to do with Python.

Numba modes

The are two modes in Numba: nopython and object. The former doesn't use Python runtime and produces native code without Python dependencies. The native code is statically typed and runs very fast. Whereas the object mode uses Python objects and Python C API, which often does not give significant speed improvements. In both cases, Python code is compiled using LLVM.

What is LLVM?

LLVM is a compiler, that takes a special intermediate representation (IR) of the code and compiles it down to native (machine) code. The process of compiling involves a lot of additional "passes" in which the compiler optimizes IR. LLVM toolchain is very good at optimizing IR, so not only it compiles code for Numba, but also optimizes it.

The whole system roughly looks as follows:Python numba architecture

Advantages of Numba:

  • Ease of use
  • Automatic parallelization
  • Support for numpy operations and objects
  • GPU support

Disadvantages of Numba:

  • Many layers of abstraction make it very hard to debug and optimize
  • There is no way to interact with Python and its modules in nopython mode
  • Limited support for classes

Cython

Instead of analyzing bytecode and generating IR, Cython uses a superset of Python syntax which later translates to C code. When working with Cython, you basically writing C code with high-level Python syntax.

When working with Cython, you usually don't have to worry about wrappers and low-level API calls, because all interactions are automatically expanded to a proper C code.

Unlike Numba, all Cython code should be separated from regular Python code in special files. Cython parses and translates such files to C code and then compiles it using provided C compiler (e.g. gcc).

Python code is already valid Cython code.

def sum_sequence_cython(a, b):
    result = np.zeros_like(a)
    for i in range(len(a)):
        result[i] = a[i] - b[i]
    return result

However, typed version works a lot faster.

cimport numpy as np

cpdef sum_sequence_cython(np.ndarray[np.int64_t, ndim=1] a, np.ndarray[np.int64_t, ndim=1] b):
    cdef int N = a.shape[0]
    cdef np.ndarray[np.int64_t, ndim=1] result = np.zeros([N], dtype=np.int)

    for i in range(N):
        result[i] = a[i] - b[i]
    return result
>>> timeit.timeit('sum_sequence_untyped(a, b)', globals=globals(), number=1)
Untyped Cython version 2.0215444170171395
>>> timeit.timeit('sum_sequence_cython(a, b)', globals=globals(), number=1)
Typed Cython version 0.046073039297712967

Writing fast Cython code requires an understanding of C and Python internals. If you know C, your Cython code can run as fast as the C code.

Advantages of Cython:

  • Control over Python API usage
  • Easy interfacing with C/C++ libraries and C/C++ code
  • Parallel execution support
  • Support for Python classes, which gives object-oriented features in C

Disadvantages of Cython:

  • Learning curve
  • Requires expertise both in C and Python internals
  • Inconvenient organization of modules


Лучшая Python рассылка



Разместим вашу рекламу

Пиши: mail@pythondigest.ru

Нашли опечатку?

Выделите фрагмент и отправьте нажатием Ctrl+Enter.

Система Orphus