At Mergify, we generate a pretty large amount of logs. Every time an event is received from GitHub for a particular pull request, our engine computes a new state for it. Doing so, it logs some informational statements about what it's doing — and any error that might happen.
This information is precious to us. Without proper logging, it'd be utterly impossible for us to debug any issue. As we needed to store and index our logs somewhere, we picked Datadog as our log storage provider.
Datadog offers real-time indexing of our logs. The ability to search our records that fast is compelling as we're able to retrieve log about a GitHub repository or a pull request with a single click.
To achieve this result, we had to inject our Python application logs into Datadog. To set up the Python logging mechanism, we rely on daiquiri, a fantastic library I maintained for several years now. Daiquiri leverages the regular Python
logging module, making its a no-brainer to set up and offering a few extra features.
We recently added native support for the Datadog agent in daiquiri, making it even more straightforward to log from your Python application.
Enabling log on the Datadog agent
Datadog has extensive documentation on how to configure its agent. This can be summarized to adding
logs_enabled: true in your agent configuration. Simple as that.
You then need to create a new source for the agent. The easiest way to connect your application and the Datadog agent is using the TCP socket. Your application will write logs directly to the Datadog agent, which will forward the entries to Datadog backend.
Create a configuration file in
conf.d/python.d/conf.yaml with the following content:
Once this is done, you need to configure your Python application to log to the TCP socket configured in the agent above.
The Datadog agent expects logs in JSON format being sent, which is what daiquiri does for you. Using JSON allows to embed any extra fields to leverage fast search and indexing. As daiquiri provides native handling for extra fields, you'll be able to send those extra fields without trouble.
First, list daiquiri in your application dependency. Then, set up logging in your application this way:
import daiquiri daiquiri.setup( outputs=[ daiquiri.output.Datadog(), ], level=logging.INFO, )
This configuration logs to the default TCP destination
localhost:10518 — though you can pass the
port argument to change that. You can customize the outputs as you wish by checking out daiquiri documentation. For example, you could also include logging to
stdout by adding
daiquiri.output.Stream(sys.stdout) in the output list.
When using daiquiri, you're free to use
logging.getLogger to get your regular logging object. However, by using the alternative
daiquiri.getLogger function, you're enabling the native use of extra arguments — which is quite handy. That means you can pass any arbitrary key/value to your log call, and see it up being embedded in your log data — up to Datadog.
Here's an example:
import daiquiri […] log = daiquiri.getLogger(__name__) log.info("User did something important", user=user, request_id=request_id)
The extra keyword argument passed to
log.info will be directly shown as attributes in Datadog logs:
All those attributes can then be used to search or to display custom views. This is really powerful to monitor and debug any kind of service.
A log object per object
When passing extra arguments, it is easy to make mistakes and forget some. This especially can happen when your application wants to log information for a particular object.
The best pattern to avoid this is to create a custom log object per object:
import daiquiri class MyObject: def __init__(self, x, y): self.x = x self.y = y self.log = daiquiri.getLogger("MyObject", x=self.x, y=self.y) def do_something(self): try: self.call_this() except Exception: self.log.error("Something bad happened")
By using the
self.log object as defined above, there's no way for your application to miss some extra fields for an object. All your logs will look in the same style and will end up being indexed correctly in Datadog.
The extra arguments from the Python loggers are often dismissed, and many developers stick to logging strings with various information included inside. Having a proper explanation string, plus a few extra key/value pairs that are parsable by machines and humans, is a better way to do logging. Leveraging engines such as Datadog allow to store and query those logs in a snap.
This is way more efficient than trying to parse and grep strings yourselves!