13.03.2018       Выпуск 221 (12.03.2018 - 18.03.2018)       Статьи

Эффективное использование памяти при параллельных операциях ввода-вывода в Python. Оригинал


Экспериментальная функция:

Ниже вы видите текст статьи по ссылке. По нему можно быстро понять ссылка достойна прочтения или нет

Просим обратить внимание, что текст по ссылке и здесь может не совпадать.


Gevent is an alternative approach to parallelisation and it brings coroutines to pre Python 3.5 code. Under the hood it takes advantage of small, independent pseudo-thread “Greenlets”, but also spawns some threads for internal needs. The overall memory footprint is very similar to multithreading.

Pseudo-thread memory usage


Since the release of Python 3.5, coroutines are now possible with the asyncio module which is part of the standard Python library. To take advantage of asyncio I used aiohttp instead of requests. aiohttp is an async equivalent of requestswith the same functionality and similar API.

In general, this is a point to consider before starting a project in async, although most of the popular IO related packages — requests, redis, psycopg2 — have their equivalents in the async world.

Coroutine memory usage (asyncio)

With asyncio, memory usage is significantly lower compared to the previous methods. It’s very close to a single thread version of the script without parallelisation.

So should we start using asyncio?

Parallelism is a very efficient way of speeding up an application that has a lot of IO operations. In my case, there was a ~40% speed increase compared to sequential processing. Once a code runs in parallel, the difference in speed performance between the parallel methods is very low. An IO operation heavily depends on the performance of the other systems (i.e. network latency, disk speed, etc). Therefore, the execution time difference between the parallel methods is negligible.

ThreadPoolExecutor and Gevent are very powerful tools that can speed up an existing application. One major advantage is that in most cases it requires only minor changes in the codebase. When it comes to overall performance, the best performing tool is asyncio with its local threads. The memory footprint is much lower compared to other parallel methods without impacting the overall speed. It comes with a price though, the codebase and its dependencies have to be specifically designed for use with asyncio. This is something that has to be considered when moving a codebase to coroutines.

At Kiwi.com we use asyncio in high performing APIs where we want to achieve speed with a low memory footprint on our infrastructure. An example of an “asyncio service” running at Kiwi.com is our public API for geographical locations data. You can try using the service yourself and the documentation is available here.

Лучшая Python рассылка

Разместим вашу рекламу

Пиши: mail@pythondigest.ru

Нашли опечатку?

Выделите фрагмент и отправьте нажатием Ctrl+Enter.

Система Orphus