04.11.2020       Выпуск 359 (02.11.2020 - 08.11.2020)       Статьи

Caching in Python Using the LRU Cache Strategy

Читать>>




Экспериментальная функция:

Ниже вы видите текст статьи по ссылке. По нему можно быстро понять ссылка достойна прочтения или нет

Просим обратить внимание, что текст по ссылке и здесь может не совпадать.

This is an excellent opportunity to cache the article’s contents and avoid hitting the network every five seconds. You could use the @lru_cache decorator, but what happens if the article’s content is updated?

The first time you access the article, the decorator will store its content and return the same data every time after. If the post is updated, then the monitor script will never realize it because it will be pulling the old copy stored in the cache. To solve this problem, you can set your cache entries to expire.

Evicting Cache Entries Based on Both Time and Space

The @lru_cache decorator evicts existing entries only when there’s no more space to store new listings. With sufficient space, entries in the cache will live forever and never get refreshed.

This presents a problem for your monitoring script because you’ll never fetch updates published for previously cached articles. To get around this problem, you can update the cache implementation so it expires after a specific time.

You can implement this idea into a new decorator that extends @lru_cache. If the caller tries to access an item that’s past its lifetime, then the cache won’t return its content, forcing the caller to fetch the article from the network.

Here’s a possible implementation of this new decorator:

 1from functools import lru_cache, wraps
 2from datetime import datetime, timedelta
 3
 4def timed_lru_cache(seconds: int, maxsize: int = 128):
 5    def wrapper_cache(func):
 6        func = lru_cache(maxsize=maxsize)(func)
 7        func.lifetime = timedelta(seconds=seconds)
 8        func.expiration = datetime.utcnow() + func.lifetime
 9
10        @wraps(func)
11        def wrapped_func(*args, **kwargs):
12            if datetime.utcnow() >= func.expiration:
13                func.cache_clear()
14                func.expiration = datetime.utcnow() + func.lifetime
15
16            return func(*args, **kwargs)
17
18        return wrapped_func
19
20    return wrapper_cache

Here’s a breakdown of this implementation:

  • Line 4: The @timed_lru_cache decorator will support the lifetime of the entries in the cache (in seconds) and the maximum size of the cache.
  • Line 6: The code wraps the decorated function with the lru_cache decorator. This allows you to use the cache functionality already provided by lru_cache.
  • Lines 7 and 8: These two lines instrument the decorated function with two attributes representing the lifetime of the cache and the actual date when it will expire.
  • Lines 12 to 14: Before accessing an entry in the cache, the decorator checks whether the current date is past the expiration date. If that’s the case, then it clears the cache and recomputes the lifetime and expiration date.

Notice how, when an entry is expired, this decorator clears the entire cache associated with the function. The lifetime applies to the cache as a whole, not to individual articles. A more sophisticated implementation of this strategy would evict entries based on their individual lifetimes.

Caching Articles With the New Decorator

You can now use your new @timed_lru_cache decorator with the monitor script to prevent fetching the content of an article every time you access it.

Putting the code together in a single script for simplicity, you end up with the following:

 1import feedparser
 2import requests
 3import ssl
 4import time
 5
 6from functools import lru_cache, wraps
 7from datetime import datetime, timedelta
 8
 9if hasattr(ssl, "_create_unverified_context"):
10    ssl._create_default_https_context = ssl._create_unverified_context
11
12def timed_lru_cache(seconds: int, maxsize: int = 128):
13    def wrapper_cache(func):
14        func = lru_cache(maxsize=maxsize)(func)
15        func.lifetime = timedelta(seconds=seconds)
16        func.expiration = datetime.utcnow() + func.lifetime
17
18        @wraps(func)
19        def wrapped_func(*args, **kwargs):
20            if datetime.utcnow() >= func.expiration:
21                func.cache_clear()
22                func.expiration = datetime.utcnow() + func.lifetime
23
24            return func(*args, **kwargs)
25
26        return wrapped_func
27
28    return wrapper_cache
29
30@timed_lru_cache(10)
31def get_article_from_server(url):
32    print("Fetching article from server...")
33    response = requests.get(url)
34    return response.text
35
36def monitor(url):
37    maxlen = 45
38    while True:
39        print("\nChecking feed...")
40        feed = feedparser.parse(url)
41
42        for entry in feed.entries[:5]:
43            if "python" in entry.title.lower():
44                truncated_title = (
45                    entry.title[:maxlen] + "..."
46                    if len(entry.title) > maxlen
47                    else entry.title
48                )
49                print(
50                    "Match found:",
51                    truncated_title,
52                    len(get_article_from_server(entry.link)),
53                )
54
55        time.sleep(5)
56
57monitor("https://realpython.com/atom.xml")

Notice how line 30 decorates get_article_from_server() with the @timed_lru_cache and specifies a validity of 10 seconds. Any attempt to access the same article from the server within 10 seconds of having fetched it will return the contents from the cache and never hit the network.

Run the script and take a look at the results:

$ python monitor.py

Checking feed...
Fetching article from server...
Match found: The Real Python Podcast – Episode #28: Using ... 29521
Fetching article from server...
Match found: Python Community Interview With David Amos 54254
Fetching article from server...
Match found: Working With Linked Lists in Python 37100
Fetching article from server...
Match found: Python Practice Problems: Get Ready for Your ... 164887
Fetching article from server...
Match found: The Real Python Podcast – Episode #27: Prepar... 30783

Checking feed...
Match found: The Real Python Podcast – Episode #28: Using ... 29521
Match found: Python Community Interview With David Amos 54254
Match found: Working With Linked Lists in Python 37100
Match found: Python Practice Problems: Get Ready for Your ... 164887
Match found: The Real Python Podcast – Episode #27: Prepar... 30783

Checking feed...
Match found: The Real Python Podcast – Episode #28: Using ... 29521
Match found: Python Community Interview With David Amos 54254
Match found: Working With Linked Lists in Python 37100
Match found: Python Practice Problems: Get Ready for Your ... 164887
Match found: The Real Python Podcast – Episode #27: Prepar... 30783

Checking feed...
Fetching article from server...
Match found: The Real Python Podcast – Episode #28: Using ... 29521
Fetching article from server...
Match found: Python Community Interview With David Amos 54254
Fetching article from server...
Match found: Working With Linked Lists in Python 37099
Fetching article from server...
Match found: Python Practice Problems: Get Ready for Your ... 164888
Fetching article from server...
Match found: The Real Python Podcast – Episode #27: Prepar... 30783

Notice how the code prints the message "Fetching article from server..." the first time it accesses the matching articles. After that, depending on your network speed and computing power, the script will retrieve the articles from the cache either one or two times before hitting the server again.

The script tries to access the articles every 5 seconds, and the cache expires every 10 seconds. These times are probably too short for a real application, so you can get a significant improvement by adjusting these configurations.






Разместим вашу рекламу

Пиши: mail@pythondigest.ru

Нашли опечатку?

Выделите фрагмент и отправьте нажатием Ctrl+Enter.

Система Orphus