
Tech Blog: JSON logging in Python
By Tomislav Levanic on February 24, 2020
JSON logging everywhere
If you haven’t already, be sure to check out our previous blog post where we discussed how we configured uWSGI to use JSON logging. There was one important detail we practically skimmed over. We assumed application logs would already be JSON formatted. This is, however, not the case for Python, and now we will discuss the exact way our team tackled that specific issue.
Custom-built JSON formatter
Existing solutions
Before starting, we decided that we wanted a solution filling out these requirements
- an easy way to log LogRecord attributes in JSON log using arbitrary keys
- log additional static values, ie.
app_name
, etc. - log values provided as argument
extra
in logging methods
The first thing we did was, of course, a search for other open-source solutions for JSON logging. There are existing solutions out there, however, none of them seemed a good fit for our use case. A good example is python-json-logging which offers a wide variety of features we do not intend to use, and more importantly, fails at our first and second requirement.
Our solution
Finally, we built a custom JSONFormatter allowing us simple configuration by passing the
argument fields
to replace Python’s LogRecord keys with our own, for example:
{
"message": "message",
"time": "created",
"log_level": "levelname",
"process": "process",
"process_name": "processName",
"thread": "thread",
"thread_name": "threadName",
"traceback": "exc_text"
}
where we evaluate values as LogRecord attributes and log them under their keys. Similarly,
we can also pass always_extra
key during formatter initialization to log additional static
values. Just like python-json-logging does as well, we also log values provided as argument
extra in logging methods.
Since we want to use the same logging for all of our Python microservices, we also need a simple way to set up our log formatting. This is where the function useful.logs.setup with sane defaults comes in. This makes JSON logging a breeze.
import logging
import useful.logs
useful.logs.setup()
_log = logging.getLogger(__name__)
if __name__ == '__main__':
_log.info("testing", extra={"app": "AppName", "metric": 1.2})
Edge cases
It is good to have JSON logging in cases you explicitly call logging methods. There are, however, situations where you simply do not catch all possible exceptions and Python writes the exception traceback to stderr. Python provides a mechanism for solving this issue. We do it by overriding a function sys.excepthook which will serve as an alternative “logging” function called in case of uncaught exceptions. Here we can override writing to stderr and instead use our standard JSON logging.
import logging
import sys
def except_logging(exc_type, exc_value, exc_traceback):
logging.error("Uncaught exception",
exc_info=(exc_type, exc_value, exc_traceback))
sys.excepthook = except_logging
This is better, but what happens when it is impossible to catch the exception? Yes, such cases do exist. One example would be an exception during garbage collection. Again, the Python sys module gives us the solution! Simply set the hook sys.unraisablehook in a similar manner as sys.excepthook.
import logging
import sys
def unraisable_logging(args):
exc_type, exc_value, exc_traceback, err_msg, _ = args
default_msg = "Unraisable exception"
logging.error(err_msg or default_msg,
exc_info=(exc_type, exc_value, exc_traceback))
sys.unraisablehook = unraisable_logging
Sadly, we are not done yet. We still have to handle exceptions raised by threading.Thread.run() and as of Python version 3.8.0 we can override the hook threading.excepthook the same way we did sys.excepthook.
import logging
import threading
def threading_except_logging(args):
exc_type, exc_value, exc_traceback, _ = args
logging.error("Uncaught threading exception",
exc_info=(exc_type, exc_value, exc_traceback))
threading.excepthook = threading_except_logging
The final issue is with the Python (as of writing, the latest stable version is 3.8.1)
multiprocessing
module, which implements and calls its own exception handling routine that
writes to stderr instead of calling the sys.excepthook. As the answer on the StackOverflow
suggests, you can “fix” this issue by overriding method multiprocessing.Process.run
which
will manually catch all exceptions and call sys.excepthook. We do not use multiprocessing
in any of our microservices and have thus decided not to implement any solution to
this problem since we haven’t found a clean way to do it.
Besides multiprocessing, all other edge cases mentioned above are handled inside useful.logs.setup.
Request tracing id
When building REST APIs in a microservice environment, it is very important to
have a way of tracing requests across the infrastructure. A simple way to do
it is to save some sort of identification value in the request header set by
the first service in the infrastructure (nginx in our case) and log it in every
service afterward. This requires you to have access to the request at the moment of
logging, which can be a bit impractical, especially when using a framework where
current request isn’t stored in some sort of global variable like Flask does. Since we
use Falcon that has no such implementation and we are running on uwsgi with threads, we
implemented our own useful.logs.context based on threading.local that can be
used to store request tracing id
and is accessible inside log formatter. All that
is left is a simple manner of implementing custom Falcon middleware to automatically
read the request header and store it into useful.logs.context. If you need support
for greenlets, werkzeug provides an awesome alternative implementation supporting
both threads and greenlets along with the way to “clean” the context after each
request since WSGI does not guarantee that threads won’t be reused for
multiple requests. If you are using DataDog you can use Python module ddtrace to
solve this issue.
We still have to explore how to configure nginx for JSON logging and request tracing along with log collecting infrastructure, so stay tuned!
Follow Velebit AI on LinkedIn.
Recent blog posts

Tech Blog: How to configure JSON logging in nginx?
In previous posts from this series, we discussed how we formatted UWSGI and Python logs in JSON format. We still have one important production component left: the Nginx server. This blog post will describe how the Nginx logging module works, and showcase a simple logging configuration where Nginx logger is configured to output JSON logs.
Read more

Tech Blog: Connecting Elasticsearch logs to Grafana and Kibana
What is the point in collecting logs and metrics if you don’t use them? In this blog post, we will build upon our previous blog post and connect Fluent Bit log collectors to Elasticsearch along with a basic setup and comparison of Kibana and Grafana, tools often used for visualizing logs
Read more

An Introduction to Statistics and Data Science and Differences between Them
There is a great deal of overlap between the fields of statistics and data science, to the point where many definitions of one discipline could just as easily describe the other. While this is true, there are also many differences between them. Why is statistics important and what is its connection to data science? What is data science? What are their similarities and differences? Let’s try to understand it better at least on a basic level without too much going into subtle details.
Read more