">
❮ Back to Blog

Instrumenting Code Using Prism and the Ruby Abstract Syntax Tree

A repository for this article can be found here.

When most developers think about request tracing, they picture instrumentation hooks inside familiar libraries. This allows us to track familiar metrics we see in application performance monitoring (APM) tools such as the duration of an HTTP call or how long a database query takes. But what if you could go deeper and instrument your own Ruby code automatically, without sprinkling timing calls everywhere? This is what we call AutoInstruments and where the Ruby Abstract Syntax Tree (AST) and the Ruby parser (Prism) come into play.

We’ll skip the long and fragmented history of Ruby’s parsers and interpreters and just note that for our purposes, as of Ruby 3.4, Prism is now the default Ruby parser (introduced as a default gem in Ruby 3.3) and is responsible for turning Ruby source code into an AST. From there, the YARV virtual machine, Ruby’s bytecode interpreter since 1.9, compiles and executes it.


Let’s peel back the various layers of instrumenting code automatically and see how it works in practice.


Understanding YARV Instruction Sequence


Since we need to transform and modify Ruby source code, it helps to first understand how YARV takes that source, compiles it into bytecode, and executes it. Ruby exposes this process through the RubyVM::InstructionSequence class, which lets us compile Ruby source code into its underlying instruction sequence.


In this we can either compile a file using compile_file_prism or raw source. For this example, we will just use raw source:

3.4.2 :002 > insns = RubyVM::InstructionSequence.compile_prism("puts 'HI WORLD'")
=> <RubyVM::InstructionSequence:<compiled>@<compiled>:1>


When we do this, we get an instruction sequence which allows us to do a few things. One of which is that we can disassemble it to show us the produced instructions in human readable form:

3.4.2 :003 > pp insns.disassemble
"== disasm: #<ISeq:<compiled>@<compiled>:1 (1,0)-(1,15)>\n" +
"0000 putself                                                          (   1)[Li]\n" +
"0001 putchilledstring                       \"HI WORLD\"\n" +
"0003 opt_send_without_block                 <calldata!mid:puts, argc:1, FCALL|ARGS_SIMPLE>\n" +
"0005 leave\n"


Another is to tell it to evaluate the instructions as well:

3.4.2 :004 > insns.eval
HI WORLD
=> nil


For a more detailed look into YARV and these various instructions, see Kevin Newton’s (the creator of Prism) series the Advent of YARV.


Also Ruby Under a Microscope is getting a 3.x edition, so check that out as well.


Load_iseq Hook


With the ultimate goal of dynamically adding instrumentation to a source file, we need a way to intercept when a file’s instruction sequence is loaded — for example, during require 'path/to/file.rb'. The idea is to modify the source to include our instrumentation before returning the instruction sequence for the altered code. That’s where load_iseq comes in. By prepending to this method, we can intercept the normal instruction sequence, use Prism to parse the file into an Abstract Syntax Tree (AST) — the same structure YARV builds internally — perform the necessary transformations, and then return the modified instruction sequence.

module Patch
 module InstructionSequence
   def load_iseq(path)
      # Return the path/file’s altered source’s instruction sequence
      self.comile_file_prism(transform_file_source(path))
 
      # What normally occurs:
      # self.compile_file_prism(path)
   end
 end
 
 class << RubyVM::InstructionSequence
   prepend InstructionSequence
 end
end


For example, if we look at Part 1, in the example.rb we are running:

puts “HELLO WORLD”


However, in the patch.rb, we are saying to instead use ‘puts “HI WORLD”’.


When we run the example file along with the patch (note the order):

ruby -r ./patch.rb -r ./example.rb -e 'exit'


We see that we get HI WORLD even though example.rb had puts HELLO WORLD in it.


Let’s take this a step forward and discuss how we can create the transform_file_source part of the equation.


Swapping method names using the AST


Many links below will point to docs found in the Advent of Prism by Kevin Newton.


There are many things that we can do to transform the source of a file. While adding our instrumentation to the source automatically is our goal, let’s start with a simpler application. What if we want to swap the names of a method? There are considerably easier ways to do this in Ruby, such as alias_method, but let’s see how we can do this with the AST/Prism parsing and swapping out some of the Ruby source code for compilation. For this example, let’s try and swap the method name in part 2’s example.rb, so that if we run it the greet call will work as expected.


At the same time when looking at part 2’s example.rb we can see a simplified AST (in part 4 we will write out the AST for each required file) that we get when we parse it with Prism. At the top we find that the root most node of the AST for Prism / ParseResult is going to be ProgramNode followed by a child node of StatementsNode. With the AST being a unidirectional tree, if we are wanting to swap the names of a method (such as greeting to greet in example.rb) we will need to recursively check the child nodes from ProgramNode -> StatementsNode, down until we find a DefNode with the old name of greeting. Once we find this DefNode, we can get the Location of where the node is in the source. Once we have this bit of information, we can then overwrite this source location with the new method name.


One thing of note, is that since the Location gives offsets in bytes, we will need to set the encoding of the source string to ASCII-8BIT (utilizing String#b), otherwise we may run into issues where non UTF-8 characters, such as those found in comments (and potentially string literals), will cause index out of range issues when trying to replace the source. While YARV works with this encoding, in real world practice, we will need to convert these back to the original encoding, as we can run into situations of incompatible string encodings.


As seen earlier when looking at part 2’s example.rb, since there is no method defined of greet but instead greeting, if we run it:

ruby -r ./example.rb -e 'exit'


We will get a undefined local variable or method 'greet', but if we require the patch first (which redefines greeting to greet), we see:

ruby -r ./patch.rb -r ./example.rb -e 'exit'
Hello World


Instrumenting Method Calls

Now that we’ve covered the general high level structure of the Prism AST, let’s take a look at how we can go about instrumenting method calls. In a nutshell our instrumentation is really just a simple timing:

start_time = Time.now
# do work
end_time = Time.now
puts "Work took #{end_time - start_time}"


Given this, we can tweak this a bit to be more reusable with something as simple as:

class Timer
 def self.capture
   start_time = Time.now
   yield
   end_time = Time.now
   puts "Capture took #{end_time - start_time}"
 end
end

Timer.capture do
 # do work
end


Building on this foundation, we can build up to something that resembles a span/layer with timings, which would match the API that we have here at Scout:

ScoutApm::Tracer.instrument("Category", "Subcategory") do
 # do work
end


So to properly trace we really just want to inject our timing API into the source code wrapping any method calling.

def some_work
 # do work
end

ScoutApm::Tracer.instrument("Work", "some_work") do
 some_work
end


That’s really the crux of it. However, just like many things in life, the devil is in the details. In Ruby, there are many many ways that calls can be made in Ruby.

Class Method Wrapping

For the vast majority of Ruby code written, it is likely written where the file contains a single module or class which may contain child modules and classes with their respective methods. This is especially true for controllers in Rails, which is what is parsed with the AutoInstruments feature.

It’s at this point where we start to see the complexities of the AST start to build. As such, we should really be writing out the AST of the file that we are parsing, as well as the rewritten version of it to verify that we are indeed creating the correct source, with our intended instrumentation wrappings.

In fact, we start to see some of this right as we go and try and instrument a something like:

Testclass.new.greet


If we use the patch.rb from the previous section we end up with something like:

BlockCaller.yielder('new') { TestClass.new("Hello, world") }{ TestClass.new("Hello, world").greet }


I don’t think that’s what we want. In fact, what do we want? Do we want a wrapper around greet or new or both? If we look at the part 4 example_ast.txt, we see that .new is a receiver / child node of .greet. As such, we may want to not to instrument nodes whose parent nodes are currently being instrumented. How can we do that? With Prism being a unidirection tree, each node won’t store references to its parents. As such, we need to manually keep track of these relationships either via a hashmap or something like a stack.


For this case, we are going to use a stack as we just need to know about the current node’s parent(s), and whether the parents were a CallNode. Once we are done processing the node, and if it is a CallNode, we will pop it off the stack to ensure sibling CallNodes do get instrumented if one of their ancestors wasn’t a CallNode. Once we have done this, we end up with the correct instrumentation as follows:

BlockCaller.yielder('greet') { TestClass.new("Hello, world").greet }


When it comes to parsing the AST and injecting / rewriting the source, a lot of it is handling various cases on how particular methods can be called, and the various ways that these nodes & method calls are related to one another.


This is the foundation of how Scout's https://scoutapm.com/docs/ruby/features#auto-instruments feature works under the hood, leveraging Prism and the AST to automatically trace your custom Ruby code without requiring any manual instrumentation. The result is deep, method-level visibility into your application's performance, right out of the box. Ready to see what's happening inside your Ruby app? https://scoutapm.com/users/sign_up and start tracing in minutes.

Ready to Optimize Your App?

Join engineering teams who trust Scout Monitoring for hassle-free performance monitoring. With our 3-step setup, powerful tooling, and responsive support, you can quickly identify and fix performance issues before they impact your users.