‹ Back to Blog

Error Monitoring for Elixir: Now in Scout APM

Elixir errors

Elixir’s “let it crash” philosophy is one of the best ideas in modern software design. Supervisors restart failed processes, the system self-heals, and life goes on. It’s like having a really good immune system.

The problem is that a really good immune system can also hide chronic conditions. A GenServer crashing and restarting is working as designed. A GenServer crashing and restarting 500 times an hour is a five-alarm fire that nobody called in because, well, the restarts kept succeeding. Web frameworks are another layer of this. Phoenix controllers can rescue exceptions and return perfectly formatted 500 pages. The system stays “up”, just not “working.”

We built error monitoring into the Scout APM Elixir agent so you can keep the resilience of “let it crash” while actually knowing what’s crashing, how often, and why.

Built In, Turned On

Error monitoring ships inside the scout_apm package itself as of v2.0.0. There’s no separate error tracking package to install. It’s enabled by default:

config :scout_apm,  errors_enabled: true

For a Phoenix app, that’s it for configuration, unless you want to customize behavior. If you’re already running scout_apm, errors will start flowing as soon as you attach the telemetry handler.

Automatic Capture via Telemetry

The primary integration point is Phoenix’s telemetry system. Add one line to your Application.start/2:

def start(_type, _args) do
  ScoutApm.Instruments.PhoenixErrorTelemetry.attach()

  children = [
    # ... your supervision tree
  ]

  opts = [strategy: :one_for_one, name: MyApp.Supervisor]
  Supervisor.start_link(children, opts)
end

This attaches handlers to two telemetry events:

  • [:phoenix, :router_dispatch, :exception] fires when an exception bubbles up through Phoenix’s router.
  • [:phoenix, :error_rendered] fires when Phoenix renders an error response (filtered to 500+ status codes, so you won’t get noise from 404s).

When either event fires, we extract the exception class, message, stacktrace, request path, parameters, session data, and controller/action. The conn struct gives us all of this context automatically, so the error arrives in Scout with enough information for you to actually diagnose it.

The handler also calls TrackedRequest.mark_error(), which links the error to the performance trace for that request. More on why that matters in a moment.

Manual Capture

Automatic capture handles the exceptions that escape your code. But what about the ones you catch deliberately? If you rescue an error, log it, and return a graceful fallback, the telemetry handler never sees it.

For those cases, use ScoutApm.Error.capture/2:

try do
  process_payment(order)
rescue
  e ->
    ScoutApm.Error.capture(e, stacktrace: __STACKTRACE__)
    reraise e, __STACKTRACE__
end

The __STACKTRACE__ special form gives you the actual stacktrace from the rescue block. If you don’t pass it, we’ll grab the current process stacktrace, but that includes our internal frames and is less useful.

You can attach additional context to help with debugging:

ScoutApm.Error.capture(e,
  stacktrace: __STACKTRACE__,
  context: %{user_id: user.id, order_id: order.id},
  request_path: "/api/orders",
  request_params: %{action: "create"}
)

The context map is freeform. Put whatever you need in there: user IDs, feature flags, queue names, the phase of the moon. It all shows up in the Scout UI attached to that specific error occurrence.

LiveView Errors Too

If you’re using ScoutApm.Instruments.LiveViewTelemetry, you get error capture for free. We listen for :exception events on mount, handle_event, and handle_params:

def start(_type, _args) do
  ScoutApm.Instruments.PhoenixErrorTelemetry.attach()
  ScoutApm.Instruments.LiveViewTelemetry.attach()
  # ...
end

When a LiveView callback raises, we capture the full error with the view module name and callback as the controller context. So an exception in MyAppWeb.DashboardLive.handle_event shows up attributed to that specific view and event, not as a generic unhandled error.

Filtering and Configuration

Not every exception deserves attention. Bots hitting nonexistent routes generate Phoenix.Router.NoRouteError at a steady clip, and you probably don’t need alerts about those. You can ignore specific exception types:

config :scout_apm,
  errors_enabled: true,
  errors_ignored_exceptions: [Phoenix.Router.NoRouteError],
  errors_filter_parameters: ["password", "credit_card", "cvv"]

The errors_filter_parameters list controls which request parameter keys get scrubbed before leaving your server. We ship with a sensible default list that covers common sensitive fields (passwords, tokens, API keys, SSNs, etc.), so adding your own list extends rather than replaces the defaults.

Sensitive parameter values are replaced with [FILTERED] before the error payload is built. The filtering runs recursively through nested maps, so a password buried three levels deep in your params still gets caught.

Errors In Context

Scout doesn’t just collect errors in isolation. Our performance metrics and endpoint traces are running right along side. That context turns “we got a RuntimeError in OrderController#create” into “we got a RuntimeError in OrderController#create after a 3-second database query timed out on the inventory check.” One of those is a stack trace. The other is a diagnosis.

Errors in Scout are grouped by exception class and message pattern, so 200 occurrences of the same Ecto.NoResultsError show up as one group with a count, not 200 separate entries. Groups can be prioritized, assigned to team members, and resolved. You can read more about the full error monitoring feature set in our error monitoring documentation.

Under the Hood

The error service runs as a GenServer that batches errors (5 per batch by default) and sends them asynchronously. There’s a max queue size of 500 to prevent memory issues if errors spike. If you’re generating errors faster than we can ship them, we drop the oldest ones rather than letting the queue grow unbounded. This is the kind of boring reliability engineering that matters when something is going very wrong in your application and you really don’t want your monitoring to make it worse.

Getting Started

If you’re already using scout_apm, update to the latest version and attach the telemetry handler:

# mix.exs
{:scout_apm, "~> 2.0"}

# application.ex
def start(_type, _args) do
  ScoutApm.Instruments.PhoenixErrorTelemetry.attach()
  # ...
end

Errors will start appearing in your Scout dashboard within minutes. If you want to try Scout APM with your Elixir application, you can start a free trial and have full error monitoring and performance tracing running in about five minutes.