Monitoring NestJS Applications in Production: What Actually Matters

You shipped your NestJS application. It works in staging. Users are hitting it now, and you have no idea what’s slow. This is the part nobody warned you about.

Most Node.js APM guides assume you’re running a vanilla Express server with a couple of middleware functions. NestJS adds more structure on top: dependency injection, modules, and service layers that can obscure where time is actually spent. If you want to find production problems, you need to know what to instrument first.

Why NestJS Monitoring Needs More Than HTTP Tracing

Under the hood, NestJS sits on top of Express (or Fastify). What matters for performance monitoring is where your application spends time: controller methods, service calls, database queries, and external HTTP requests. A controller that delegates to three services, each making database calls, can be slow in ways that are invisible at the HTTP layer.

The dependency injection container adds another dimension. Services are instantiated once (by default) and injected across modules. A poorly performing service method doesn’t just affect one controller. It affects every module that depends on it. Your monitoring needs to surface which service calls are slow and which database queries they trigger.

What To Instrument First

If you’re setting up monitoring for the first time, start here. In order of impact:

Controller response times. This is your baseline. You need per-endpoint latency percentiles (p50, p95, p99), not just averages. Averages lie. An endpoint with a 200ms average might have a p99 of 4 seconds.

Database query performance. Whether you’re using Prisma or raw SQL via pg, your database layer is almost certainly your biggest latency contributor. Instrument individual queries, not just total database time per request.

External HTTP calls. NestJS’s HttpService wraps Axios. Every outbound call to a third-party API is a reliability risk. Track each one independently.

Background job execution. If you’re running BullMQ (and most production NestJS apps are), you need visibility into queue latency and job duration. Problems here are invisible to HTTP-level monitoring.

The N+1 Problem In Prisma With NestJS

This is the single most common performance issue we see in NestJS applications, and it’s worth its own section. The pattern looks innocent enough:

@Injectable()
export class OrderService {
  constructor(private prisma: PrismaClient) {}

  async getOrdersWithCustomerNames(region: string) {
    const orders = await this.prisma.order.findMany({
      where: { region },
    });

    // This fires one query per order. 200 orders = 200 queries.
    return Promise.all(
      orders.map(async (order) => ({
        ...order,
        customerName: (
          await this.prisma.customer.findUnique({
            where: { id: order.customerId },
          })
        )?.name,
      })),
    );
  }
}

In development with 5 orders, this takes 30ms. In production with 200 orders, it takes 3 seconds. The service method returns the right data either way, so your tests pass. You only discover the problem when real traffic shows up.

GraphQL resolvers make this worse. A @ResolveField() that loads a related entity will fire once per parent object in the result set. NestJS doesn’t warn you about this because it’s technically correct behavior.

Scout Monitoring detects N+1 queries automatically. It groups identical queries that fire in rapid succession within a single request and flags the pattern. You don’t need to instrument anything special. Just look at the endpoint trace and the repeated queries are highlighted with the code location that triggered them. The fix is Prisma’s include directive, which turns 201 queries into a single query with a join.

Exception Filters Hide Errors From Your Monitoring

NestJS exception filters are powerful. They’re also a monitoring blind spot. Consider this common pattern:

@Catch()
export class GlobalExceptionFilter implements ExceptionFilter {
  catch(exception: unknown, host: ArgumentsHost) {
    const ctx = host.switchToHttp();
    const response = ctx.getResponse();

    const status =
      exception instanceof HttpException
        ? exception.getStatus()
        : HttpStatus.INTERNAL_SERVER_ERROR;

    response.status(status).json({
      statusCode: status,
      message: 'Something went wrong',
    });
  }
}

This filter catches every exception and returns a clean JSON response. Your users get a nice error. Your generic error tracker sees a 500 response but has no stack trace, no exception type, no context about which service threw it. The filter swallowed all of that.

Scout Monitoring captures exceptions at the framework level, before your filters transform them. You get the original error class, the full stack trace, and the request context. The filter still runs and your users still get the clean response. But your monitoring sees what actually happened.

Background Jobs Deserve The Same Attention As HTTP Requests

BullMQ is the de facto job queue for NestJS. Most teams instrument their HTTP endpoints thoroughly and then completely ignore their job processors. This is a mistake.

A job that normally takes 500ms and starts taking 5 seconds won’t trigger any HTTP alerts. There’s no user staring at a loading spinner. The queue just gets deeper, your Redis memory usage climbs, and eventually jobs start timing out or getting retried. By the time someone notices, you have a backlog of thousands of jobs and no idea when the degradation started.

Track two things for every job type: execution duration and queue wait time. The first tells you if your processor is slow. The second tells you if your throughput can’t keep up with your ingest rate. Together, they tell you whether you need to optimize the processor or scale the workers. Scout’s custom instrumentation API lets you wrap your BullMQ processors to capture this data alongside your web request traces.

Sequential Calls That Should Be Parallel

This is another pattern that works fine in development and falls apart in production:

@Controller('dashboard')
export class DashboardController {
  constructor(
    private analytics: AnalyticsService,
    private billing: BillingService,
    private notifications: NotificationService,
  ) {}

  @Get()
  async getDashboard(@CurrentUser() user: User) {
    // Each call waits for the previous one. Total time = sum of all three.
    const stats = await this.analytics.getUserStats(user.id);
    const invoice = await this.billing.getCurrentInvoice(user.id);
    const alerts = await this.notifications.getUnread(user.id);

    return { stats, invoice, alerts };
  }
}

If each service call takes 200ms, this endpoint takes 600ms. Wrap them in Promise.all and it takes 200ms. The fix is trivial once you see the problem. The hard part is seeing the problem. In a request trace, Scout Monitoring shows each external call on a timeline. Three sequential bars that don’t overlap are an obvious signal. Three bars stacked in a waterfall chart scream “parallelize me.”

Ask Your AI Assistant What’s Slow

Scout Monitoring supports the Model Context Protocol. This means you can connect your AI coding assistant and ask it questions about your production data in natural language. “What are the slowest NestJS endpoints this week?” is a real query that returns real numbers.

During incident response, this changes how fast you move. Instead of clicking through dashboards and building filters, you ask a question and get an answer. “Which endpoints have gotten slower since Tuesday’s deploy?” will surface regressions faster than any alert rule you could write in advance.

The MCP integration works with any compatible AI client. Your assistant gets read access to your Scout data and can correlate performance metrics, error rates, and deployment markers. It won’t replace your dashboards for ongoing monitoring, but for ad-hoc investigation it’s remarkably fast.

What Not To Monitor

Not everything deserves instrumentation. Monitoring NestJS middleware that adds a request ID header is noise. Tracking the execution time of a ParseIntPipe is noise. Your validation pipes (unless they call external services) are noise.

Focus on I/O boundaries: database queries, HTTP calls, queue operations, file system access. These are where latency lives. CPU-bound work in your NestJS application is almost never the bottleneck unless you’re doing image processing or cryptography in a request handler, and if you are, you already know about it.

Getting Started

Scout Monitoring’s Node.js agent instruments NestJS applications with minimal configuration. Run npm install @scout_apm/scout-apm, create a scout.ts file with your config, require it before your NestJS imports in main.ts, add the middleware and error filter, and deploy. There is a free tier and no credit card required. Within minutes you’ll have endpoint traces, database query breakdowns, and error capture.

For application monitoring with errors and traces, Scout Monitoring provides the fastest insights without the bloat.