❮ Back to Blog

How a Singleton Pattern Broke Our Django Logging

With modern tooling and agentic coding assistants, straightforward bugs are almost a relief. If a test can catch it, or a user can reproduce it, chances are you can squash it quickly. The harder category — and the one worth writing about — are the bugs where everything looks correct. Your code runs, no exceptions are thrown, your debug statements confirm the right functions fire at the right times, and yet nothing works.

We ran into one of these recently in our Python logging library, and the root cause turned out to have nothing to do with our code’s logic. It had everything to do with what Django does to your objects after you hand them over.

Context

At Scout APM, we maintain a Python logging handler (scout_apm_python_logging) that bridges standard Python logging with OpenTelemetry. When your application writes a log message, our handler enriches it with trace context and ships it to our backend via the OpenTelemetry SDK.

The core of this integration is a custom logging.Handler subclass. If you’re not familiar with Python’s logging module: a handler determines what happens with a log message once it’s been emitted. You might have one that writes to a file, one that sends to stdout, and in our case, one that forwards to an OpenTelemetry receiver. The important thing to know is that handlers have a lifecycle — they can be opened, they can emit records, and they can be closed.

The Change

We’d written a fix for a Django-specific configuration timing issue and had initially held off on releasing it. We wanted to give it another look before tagging a new version. We got busy, that second look never happened, and the change shipped as part of a larger release.

Here’s the problem it was solving: Django reconfigures logging during startup. In our testing, we observed our handler being instantiated seven times as Django went through its initialization phases — setting up apps, middleware, and so on. Our handler needed configuration values (API keys, endpoints) that Django hadn’t loaded yet during those early instantiations.

The fix was to defer initialization until the first emit call, and to share the underlying OpenTelemetry LoggerProvider as a singleton across all handler instances:

class ScoutOtelHandler(logging.Handler):
   _initialization_lock = threading.Lock()
   otel_handler = None  # Shared across all instances

   def _initialize(self):
       with self._initialization_lock:
           if ScoutOtelHandler.otel_handler:
               return  # Already initialized by another instance
           # ... set up the LoggerProvider and LoggingHandler

Django creates multiple instances of our handler? Fine — the first one to handle a log message initializes the shared provider, and the rest use it. The configuration timing issue was solved.

Logs would start flowing in - until they didn’t.

The Symptoms

After deploying, we saw a frustrating pattern: the initial batch of startup logs would report correctly, but once the application was handling requests, every subsequent log was silently dropped. No errors, no exceptions, no warnings, just silence.

We added debug print statements throughout the handler (since using logging to debug a logging handler is its own kind of problem) and confirmed that otel_handler.emit() was being called exactly when expected. The code was executing correctly, but the logs were vanishing.

Debugging

This kind of bug can make you question your baseline understanding of how the heck the thing even works in the first place. If the code is executing “correctly”, where do you start? We went down several paths:

  • Was it a threading issue with the initialization lock?
  • Was the _handling_log recursion guard getting stuck?
  • Was there a race condition during Django’s startup sequence?

We created multiple test branches, added instrumentation, and verified every assumption we could think of. Each time: the code was correct, emit was called, and nothing happened.

The breakthrough came when we stopped looking at our code and started asking what Django was doing with our handler objects after we handed them over.

The Root Cause

Here’s what was actually happening:

  1. Django instantiates our handler multiple times during startup.
  2. Our singleton pattern ensures they all share one LoggerProvider.
  3. Django, as part of its logging configuration, calls close() on handlers it’s replacing or reconfiguring.
  4. Our close() method called LoggerProvider.shutdown().
  5. Once a LoggerProvider is shut down, calling emit() on it does nothing. Silently.

Step 3 is the critical detail. Django calls close() on handlers it’s replacing, not just on application shutdown. Before the singleton change, this wasn’t a problem — each instance had its own LoggerProvider, so when Django closed early instances during setup, those providers were shut down independently. The later instances still had fresh, working providers.

By sharing a single LoggerProvider across all instances, we made the handler vulnerable to Django’s lifecycle management. Django closes a handler during configuration, our shared provider shuts down, and every subsequent log from every instance goes nowhere.

What made this especially hard to track down is that the OpenTelemetry SDK doesn’t raise an exception or log a warning when you call emit on a shut-down provider. It does nothing, silently. That’s a reasonable design choice for a telemetry SDK, but it meant that from our handler’s perspective, everything looked normal.

The Fix

Once we understood the problem, the fix was straightforward: we made close() a no-op:

def close(self):
   """
   We intentionally don't shutdown the LoggerProvider here because:
   1. We use a singleton pattern - the LoggerProvider is shared between instances.
   2. Django calls close() during configuration, not just on shutdown.

   The LoggerProvider manages its own lifecycle and will call shutdown()
   when the application exits.
   """
   super().close()

Before committing to never calling shutdown(), we needed to confirm the provider would still clean up after itself. A look at the OpenTelemetry SDK source confirmed that the LoggerProvider registers an atexit handler to call shutdown() when the process exits. We didn’t need to manage the provider’s lifecycle at all — it handles its own cleanup.

While we were in there, we also wrapped the emit logic in a try/finally block to ensure the _handling_log recursion guard always resets, even if an exception occurs:

def emit(self, record):
   # ...
   try:
       self._handling_log.value = True
       # ... enrichment and emit logic
   finally:
       self._handling_log.value = False

This wasn’t the cause of the original bug, but it was a loose end we’d found during debugging and it was worth tightening up.

Lessons

A few takeaways that generalize beyond Django and OpenTelemetry:

Audit your singleton’s lifecycle surface. When each instance owns its own resources, lifecycle mismanagement affects only that instance. The moment you share state, every consumer is affected by any single consumer’s lifecycle events. Before introducing a singleton in a framework-managed context, trace every method that modifies or destroys the shared resource. Who else might be calling close(), dispose(), or shutdown() on your objects, and when?

Map your framework’s full object lifecycle. Django doesn’t just create your handler and leave it alone. It creates, configures, closes, and recreates handlers as part of its startup process. This is documented behavior, but it’s the kind of thing you only discover when it causes a problem. When integrating with any framework, it’s worth reading the source for how it manages the objects you’re extending — not just the happy path where your methods get called.

When debugging silent failures, look outward. When there are no errors to trace, the typical debugging playbook breaks down. Instead of asking “what went wrong in my code?”, ask “what changed in my object’s environment?” and “who else touches my objects?” In our case, the insight came not from instrumenting our emit path more thoroughly, but from watching what happened to our handler between creation and use.

Test for lifecycle interference, not just correctness. Our singleton change was logically correct — it solved the configuration timing issue exactly as intended. But it also removed the accidental protection that per-instance providers gave us. When you refactor shared state, it’s worth adding a test that exercises the full framework lifecycle: create an instance, close it, then verify that other instances still function.

Prefer library-managed lifecycle over manual cleanup. If a library provides its own lifecycle management (like the OTel SDK’s atexit handler), prefer that over manual lifecycle calls in your wrapper code. You’re less likely to hit edge cases when you let the library handle its own cleanup.

Wrapping Up

The takeaway here isn’t specific to Django or OpenTelemetry. It’s about what happens when you introduce shared state into objects whose lifecycle you don’t fully control. The next time you encounter a silent failure — code that executes correctly but produces no results — it’s worth looking beyond your own logic. Examine what the framework does with your objects after you’ve handed them over.

Our Managed Logging service is great for integrating logs with the rest of the insights Scout provides. If you’re using scout_apm_python_logging with Django, this fix is available in v1.0.3. If you experience any other weirdness with django logging or otherwise, open an issue — we’d like to hear about it.

Ready to Optimize Your App?

Join engineering teams who trust Scout Monitoring for hassle-free performance monitoring. With our 3-step setup, powerful tooling, and responsive support, you can quickly identify and fix performance issues before they impact your users.