Both uWSGI and especially nginx logs are indeed more important than application logs for figuring out how the microservice-oriented infrastructure breathes in general. However, when problems occur, collecting informative application logs is a must. In this part of the JSON logging series, we discuss our solution to JSON logging in Python.
JSON logging everywhere
If you haven’t already, be sure to check out our previous blog post where we discussed
how we configured uWSGI to use JSON logging. There was one important detail we
practically skimmed over. We assumed application logs would already be JSON formatted.
This is, however, not the case for Python, and now we will discuss the exact way our team
tackled that specific issue.
Custom-built JSON formatter
Before starting, we decided that we wanted a solution filling out these requirements
log values provided as argument extra in logging methods
The first thing we did was, of course, a search for other open-source solutions for JSON
logging. There are existing solutions out there, however, none of them seemed a good fit
for our use case. A good example is python-json-logging which offers a wide variety of
features we do not intend to use, and more importantly, fails at our first and second requirement.
Finally, we built a custom JSONFormatter allowing us simple configuration by passing the
argument fields to replace Python’s LogRecord keys with our own, for example:
where we evaluate values as LogRecord attributes and log them under their keys. Similarly,
we can also pass always_extra key during formatter initialization to log additional static
values. Just like python-json-logging does as well, we also log values provided as argument
extra in logging methods.
Since we want to use the same logging for all of our Python microservices, we also need
a simple way to set up our log formatting. This is where the function useful.logs.setup with
sane defaults comes in. This makes JSON logging a breeze.
It is good to have JSON logging in cases you explicitly call logging methods. There are,
however, situations where you simply do not catch all possible exceptions and Python writes
the exception traceback to stderr. Python provides a mechanism for solving this issue. We do
it by overriding a function sys.excepthook which will serve as an alternative “logging”
function called in case of uncaught exceptions. Here we can override writing to stderr and
instead use our standard JSON logging.
This is better, but what happens when it is impossible to catch the exception? Yes, such
cases do exist. One example would be an exception during garbage collection. Again,
the Python sys module gives us the solution! Simply set the hook sys.unraisablehook in
a similar manner as sys.excepthook.
The final issue is with the Python (as of writing, the latest stable version is 3.8.1)
multiprocessing module, which implements and calls its own exception handling routine that
writes to stderr instead of calling the sys.excepthook. As the answer on the StackOverflow
suggests, you can “fix” this issue by overriding method multiprocessing.Process.run which
will manually catch all exceptions and call sys.excepthook. We do not use multiprocessing
in any of our microservices and have thus decided not to implement any solution to
this problem since we haven’t found a clean way to do it.
Besides multiprocessing, all other edge cases mentioned above are handled inside useful.logs.setup.
Request tracing id
When building REST APIs in a microservice environment, it is very important to
have a way of tracing requests across the infrastructure. A simple way to do
it is to save some sort of identification value in the request header set by
the first service in the infrastructure (nginx in our case) and log it in every
service afterward. This requires you to have access to the request at the moment of
logging, which can be a bit impractical, especially when using a framework where
current request isn’t stored in some sort of global variable like Flask does. Since we
use Falcon that has no such implementation and we are running on uwsgi with threads, we
implemented our own useful.logs.context based on threading.local that can be
used to store request tracing id and is accessible inside log formatter. All that
is left is a simple manner of implementing custom Falcon middleware to automatically
read the request header and store it into useful.logs.context. If you need support
for greenlets, werkzeug provides an awesome alternative implementation supporting
both threads and greenlets along with the way to “clean” the context after each
request since WSGI does not guarantee that threads won’t be reused for
multiple requests. If you are using DataDog you can use Python module ddtrace to
solve this issue.
We still have to explore how to configure nginx for JSON logging and request tracing
along with log collecting infrastructure, so stay tuned!
We are proud to announce that Velebit AI has, among the first, joined a recently established CroAI association. At Velebit AI, we believe artificial intelligence is becoming the driving force of the world's economy.
We are excited to announce that Velebit AI has joined the NVIDIA Inception Program as a community member. The program is designed to nurture startups revolutionizing industries with advancements in AI and data sciences.
We build AI for your needs.
Meet our highly experienced team who just loves to build AI and design its surrounding to incorporate it in your business. Find out for your self how much you can benefit from our fair and open approach.