Improving Ruby Performance with Rust

Daniel P. ClarkNovember 23rd, 2017Last Updated: November 23rd, 2017

0 21 13 minutes read

A couple of years ago, I found a few methods in my Rails application that were called several thousand times and accounted for more than 30 percent of my website’s page load time. Each of these methods were strictly focused on file pathnames.

Along with that, I came across a blog post that said “Rust to the Rescue of Ruby,” which showed me that I could write my slow-performing Ruby code in Rust and get much faster results in Ruby. Also Rust offers a safe, fast, and productive way to write code. After rewriting just a few of the slow methods for my Rails site in Rust, I was able to have pages load more than 33 percent faster than before.

If you want to learn about integrating Rust via FFI, then I suggest the blog post I linked above. The focus of my post is to share the performance lessons I’ve learned over the past two years in integrating Ruby and Rust. When methods get called many thousands of times, the slightest performance improvement will be impactful.

Getting Started

For this post, you can see working code on GitHub or if you understand starting both Rust and Ruby projects, you can create an ffi_example project and add the following to your Cargo.toml file:

[lib]
name = "ffi_example"
crate-type = ["dylib"]

[dependencies]
array_tool = "*"
libc = "0.2.33"

Add this to your ffi_example.gemspec file:

spec.add_dependency "bundler", "~> 1.12"
  spec.add_dependency "rake", "~> 12.0"
  spec.add_dependency "ffi", "~> 1.9"
  spec.add_development_dependency "minitest", "~> 5.10"
  spec.add_development_dependency "minitest-reporters", "~> 1.1"
  spec.add_development_dependency "benchmark-ips", "~> 2.7.2"

Since the library you build will need to work with FFI on a client’s system, it’s better to include FFI, Rake, and Bundler as regular dependencies.

For the example we’re using for this post, we’ll be taking code from FasterPath‘s repo history for the method basename to compare to File.basename.

Keep in mind Ruby implements this in C, so this isn’t the kind of method you’d typically be rewriting into Rust. Most of FasterPath rewrites Ruby code for the Pathname class, which is where the significant performance improvement is seen. We’re using File.basename as a pure baseline for comparison.

For the sake of brevity, we’ll be dumping all our Rust code in src/lib.rs. Here’s a copy of the code for basename written in Rust (you can copy and paste this; we won’t go over how it works here):

mod rust {
  extern crate array_tool;
  use self::array_tool::string::Squeeze;
  use std::path::MAIN_SEPARATOR;

  static SEP: u8 = MAIN_SEPARATOR as u8;

  pub fn extract_last_path_segment(path: &str) -> &str {
    // Works with bytes directly because MAIN_SEPARATOR is always in the ASCII 7-bit range so we can
    // avoid the overhead of full UTF-8 processing.
    // See src/benches/path_parsing.rs for benchmarks of different approaches.
    let ptr = path.as_ptr();
    let mut i = path.len() as isize - 1;
    while i >= 0 {
      let c = unsafe { *ptr.offset(i) };
      if c != SEP { break; };
      i -= 1;
    }
    let end = (i + 1) as usize;
    while i >= 0 {
      let c = unsafe { *ptr.offset(i) };
      if c == SEP {
        return &path[(i + 1) as usize..end];
      };
      i -= 1;
    }
    &path[..end]
  }

  pub fn basename(pth: &str, ext: &str) -> String {
    // Known edge case
    if &pth.squeeze("/")[..] == "/" { return "/".to_string(); }

    let mut name = extract_last_path_segment(pth);

    if ext == ".*" {
      if let Some(dot_i) = name.rfind('.') {
        name = &name[0..dot_i];
      }
    } else if name.ends_with(ext) {
      name = &name[..name.len() - ext.len()];
    };
    name.to_string()
  }
}

This implementation is written to mimic the way File.basename returns its results. The only thing to note here is the edge case in the beginning of the basename method. That effectively doubles the amount of time the method iterates over the given input and should be refactored into the existing system.

The extract_last_path_segment was an efficiency contribution thanks to Gleb Mazovetskiy. This method is used in others and was implemented before the edge case was known. I’ll go into the details of benchmark performance with and without the edge case later in this post.

Rust FFI Methods

The first tutorial I found on implementing Rust FFI code for handling strings showed a wrapper such as this:

extern crate libc;
use libc::c_char;
use std::ffi::{CStr,CString};


#[no_mangle]
pub extern "C" fn example(c_pth: *const c_char) -> *const c_char {
  let pth = unsafe {
    assert!(!c_pth.is_null());
    CStr::from_ptr(c_pth).to_str().unwrap()
  };
  
  let output: String = // YOUR CODE HERE

  CString::new(output).unwrap().into_raw()
}

This takes a raw C type which Ruby will give us through FFI and convert it to a string we can use in Rust and then convert it back to give to Ruby.

The important thing to note here is the assert!. The assert! method doesn’t cost us any time to have in our method, but if it evaluates to false, it will crash through Rust’s panic to a segfault in FFI. So it would be nice to have the assert! with the guarantee that nil wasn’t provided on input. But Ruby is nil friendly, and you don’t want segfaults happening, so this is unwise to use here.

Now to add nil checks in Rust isn’t difficult. Using the same kind of wrapping behavior for our code, I’ll provide the nil check version of basename.

#[no_mangle]
pub extern "C" fn basename_with_nil(c_pth: *const c_char, c_ext: *const c_char) -> *const c_char {
  if c_pth.is_null() || c_ext.is_null() {
    return c_pth;
  }
  let pth = unsafe { CStr::from_ptr(c_pth) }.to_str().unwrap();
  let ext = unsafe { CStr::from_ptr(c_ext) }.to_str().unwrap();

  let output = rust::basename(pth, ext);

  CString::new(output).unwrap().into_raw()
}

When I implemented this, I figured that if Ruby handed us a nil, it would understand a nil if we gave it right back. And it turns out that works.

So in this case, our Rust method can return either a String type or nil back to Ruby. Ruby won’t even notice that this is completely against Rust’s design of type enforcement; because in Rust, we’re only handling one type here and that’s c_char from libc::c_char.

Note that now we’re a bit safer for doing a nil guard, with a method that barely takes any time; however, this has added 4 percent more time on our method (this timing is without the edge case slow-down). If we implement the nil guard in Ruby instead of Rust, that adds another 4 percent totaling 8 percent slow down.

Keep in mind we’re splitting hairs here over something that’s already blazingly fast. These are average results, which vary +/-3 percent.

If we implement the same type safety that File.basename provides in Ruby with:

def self.basename(pth, ext = '')
  pth = pth.to_path if pth.respond_to? :to_path
  raise TypeError unless pth.is_a?(String) && ext.is_a?(String)
  // Call original Rust FFI implementation without nil guards here
end

…this would be about 17 percent slower than our original implementation above.

We haven’t even compared performance to Ruby’s C implementation yet. Working toward getting the code to be perfectly compatible costs us for every type of Type Safety Guard we have to implement.

!Sign up for a free Codeship Account

Freeing Memory

What’s worse is that even at this point in the learning process, we don’t know what’s happening to the memory when garbage collection is being called. This calls for more research into online documentation and blogs to help illuminate what’s happening here.

And I’ll tell you that in my experience, digging through what resources are available, it’s not made perfectly clear what exactly is happening here. But I’ll give you the input I’ve found.

It is allegedly reported that when using FFI, if you don’t implement the method for freeing the memory yourself, then FFI tries to call a version of C’s free method. In discussions with some of the Rust community, it comes out that you really don’t want free to be called on Rust code this way; it is generally undefined behavior or unknown what is happening, or may happen, here. So it is recommended from a few sources that you implement a method in Rust that will take back ownership of the memory of the item originally given from Rust for Rust to free it. And you need to tell Ruby to call that when it’s done.

In FFI, it’s easy enough to link to your own custom “free” method and call it manually. Or you can have Ruby automatically do it with its garbage collector via an AutoPointer or a ManagedStruct. Good examples for these are available at the FFI Wiki or at the Rust Omnibus.

If the code you are optimizing is very labor intensive, then the cost of implementing these won’t add up to that much for you. But if you’re optimizing code that’s already fast, this is pretty costly in performance with adding roughly 40 percent more time on my method if my memory serves me correctly.

The reason for this is largely because FFI is partially written in Ruby, mostly in C, and the more time you spend handling logic in Ruby-land, the less benefit you’re getting from performance of pure C or Rust.

It was after this point that I was getting disheartened at trying to edge out performance when all these little things add up and end up taking more time than I was gaining. It was then that I decided I should avoid the time that FFI spends in Ruby and try to go for a pure Rust solution.

And two such solutions exist: one called ruru and another called Helix. Between the two, I ended up choosing ruru for the following reasons.

ruru is written in the style of Rust and Helix is designed to be like writing Ruby in Rust itself.
ruru is very close to a 1.0 version and looks stable, whereas Helix is in periodic rapid development with many big features yet to come.And let me tell you! I cut away all of the time I was losing in my type safety guards by switching to ruru. But I digress; I would be remiss if I didn’t cover the Ruby code for the examples from earlier.

Ruby FFI Usage

For the sake of benchmarking, we’ll be adding some methods on the Ruby side of things. First, here’s the implementation for lib/ffi_example.rb.

require "ffi_example/version"
require "ffi"

module FfiExample
  # the example function from earlier but with two parameters
  def self.basename_with_pure_input(pth, ext = '')
    Rust.basename_with_pure_input(pth, ext)
  end

  def self.basename_nil_guard(pth, ext = '')
    return nil if pth.nil? || ext.nil?
    Rust.basename_with_pure_input(pth, ext)
  end

  def self.basename_with_nil(pth, ext = '')
    Rust.basename_with_nil(pth, ext)
  end

  def self.file_basename(pth, ext = '')
    pth = pth.to_path if pth.respond_to? :to_path
    raise TypeError unless pth.is_a?(String) && ext.is_a?(String)
    Rust.basename_with_pure_input(pth, ext)
  end

  module Rust
    extend FFI::Library
    ffi_lib begin
      prefix = Gem.win_platform? ? "" : "lib"
      "#{File.expand_path("../target/release/", __dir__)}/#{prefix}ffi_example.#{FFI::Platform::LIBSUFFIX}"
    end

    attach_function :basename_with_pure_input, [ :string, :string ], :string
    attach_function :basename_with_nil, [ :string, :string ], :string
  end
  private_constant :Rust
end

Ruby has Fiddle in its standard library for directly calling foreign C functions via the Foreign Function Interface. But it is largely undocumented for getting started and lacks many features. This is most likely why FFI was written and has a modest amount of documentation, but it is still lacking in terms of helping beginners to get well grounded in what’s going on.

The ffi gem provides some helpers that allow us to write code that works across multiple operating systems. The ffi_lib method above needs to point to the dynamic library that Rust builds for you to use. So when we run cargo build --release, it will create the library in target/release and the kind of extension will depend on the operating system. The above code in the begin/end block will work for Windows, Mac, and Linux.

Getting Started with ruru

Ruru is fairly straightforward to add to our project at this point. First, add it to our Cargo.toml file.

[dependencies]
ruru = "0.9.3"
array_tool = "*"
libc = "0.2.33"

And drop in the crate into our src/lib.rs file.

#[macro_use]
extern crate ruru;
use ruru::{RString,Class,Object};

Ruru has some nice macros to help get our methods working together with specific classes. First, we’ll define a class we want to make and then define out methods in a macro to associate them with the Ruby class.

class!(RuruExample);

methods!(
  RuruExample,
  _itself,
  fn pub_basename(pth: RString, ext: RString) -> RString {
    RString::new(
      &rust::basename(
        pth.ok().unwrap_or(RString::new("")).to_str(),
        ext.ok().unwrap_or(RString::new("")).to_str()
      )[..]
    )
  }
);

Here in the methods! macro, we first choose which class to work with. The next item is the variable we’ll use within the methods! macro block to refer to the Ruby version of self. Since we’re not using it at all here, we precede it with an underscore _itself.

Ruby has its own type system implemented in C where everything has a type identity by what VALUE is set to. Ruru has some of these types mocked into a Rust equivalent, so for Ruby’s String type, we use the RString type.

When writing methods in the methods! macro, it’s very important to know that methods within this macro’s scope cannot call each other. So any methods you want to reuse you must write outside the macro and call them there. Also when the dynamic library is created, there can easily be naming conflicts, so it’s good to add come extra characters to method names so as to not confuse them. I’ll demonstrate here…

To make the method callable from Ruby, we must first have Ruby call our Rust code to get the object instantiated natively.

#[allow(non_snake_case)]
#[no_mangle]
pub extern "C" fn Init_ruru_example(){
  Class::new("RuruExample", None).define(|itself| {
    itself.def_self("basename", pub_basename);
  });
}

The purpose of the preceding Init_ is to follow Ruby’s convention for allowing a Ruby C-style compiled library to be imported directly from the library file.

So if you were to rename the library in the Cargo.toml file so as to not conflict with the ruby name ffi_example and add the path of target/release to the load path, you should be able to require it directly with require "ruru_example" (if you named the library ruru_example). This then loads your ruru Rust code as if it were written in Ruby itself.

For a more in-depth read on linking C code with Ruby, read the docs for writing a C extension.

The other way to load the code is to simply use Fiddle to call it directly. We’ll still use FFI’s dynamic lib helper methods for the library in this example.

require 'fiddle'

library = Fiddle.dlopen(
  begin
    prefix = Gem.win_platform? ? "" : "lib"
    "#{File.expand_path("../target/release/", __dir__)}/#{prefix}ffi_example.#{FFI::Platform::LIBSUFFIX}"
  end
)

Fiddle::Function.
  new(library['Init_ruru_example'], [], Fiddle::TYPE_VOIDP).
  call

Now we’ve loaded our code into Ruby, and everything works as expected.

Benchmarking

In the gemspec included earlier, we included benchmark-ips. To benchmark our methods, let’s first drop in a Rakefile to make command-line execution far simpler.

# Rakefile
require "bundler/gem_tasks"
require "rake/testtask"

Rake::TestTask.new(:test) do |t|
  t.libs << "test"
  t.libs << "lib"
  t.test_files = FileList["test/**/*_test.rb"]
end

Rake::TestTask.new(:bench) do |t|
  t.libs = %w[lib test]
  t.pattern = 'test/**/*_benchmark.rb'
end

task :default => :test

Now we create our benchmark in test/benches/basename_benchmark.rb.

require 'test_helper'
require 'benchmark/ips'

BPATH = '/home/gumby/work/ruby.rb'

Benchmark.ips do |x|
  x.report('Ruby\'s C impl') do
    File.basename(BPATH)
    File.basename(BPATH, '.rb')
  end

  x.report('with pure input') do
    FfiExample.basename_with_pure_input(BPATH)
    FfiExample.basename_with_pure_input(BPATH, '.rb')
  end

  x.report('ruby nil guard') do
    FfiExample.basename_nil_guard(BPATH)
    FfiExample.basename_nil_guard(BPATH, '.rb')
  end

  x.report('rust nil guard') do
    FfiExample.basename_with_nil(BPATH)
    FfiExample.basename_with_nil(BPATH, '.rb')
  end

  x.report('with type safety') do
    FfiExample.file_basename(BPATH)
    FfiExample.file_basename(BPATH, '.rb')
  end

  x.report('through ruru') do
    RuruExample.basename(BPATH, '')
    RuruExample.basename(BPATH, '.rb')
  end

  x.compare!
end

Now before running the above benchmark, we’re commenting out our edge case from our basename method. The edge case is there merely to pass the Ruby Spec Suite. By the standards of what is acceptable in file paths, you don’t need to squeeze multiple slashes down to one (from /// to /). The operating systems will recognize the path just fine with them in.

Now running our benchmarks with rake bench produces the following output (be sure to run cargo build --release before running the benchmark):

Note: Ruby 2.4.2 & Rust 1.23.0-nightly

Warming up --------------------------------------
       Ruby's C impl    41.849k i/100ms
     with pure input    31.766k i/100ms
      ruby nil guard    29.974k i/100ms
      rust nil guard    31.812k i/100ms
    with type safety    27.103k i/100ms
        through ruru    41.124k i/100ms
Calculating -------------------------------------
       Ruby's C impl    683.942k (± 1.5%) i/s -      3.432M in   5.018615s
     with pure input    480.551k (± 1.6%) i/s -      2.414M in   5.025184s
      ruby nil guard    443.185k (± 2.6%) i/s -      2.218M in   5.008595s
      rust nil guard    489.863k (± 1.9%) i/s -      2.450M in   5.002297s
    with type safety    382.805k (± 1.7%) i/s -      1.924M in   5.028345s
        through ruru    667.268k (± 2.6%) i/s -      3.372M in   5.057512s

Comparison:
       Ruby's C impl:   683941.9 i/s
        through ruru:   667268.5 i/s - same-ish: difference falls within error
      rust nil guard:   489863.3 i/s - 1.40x  slower
     with pure input:   480551.2 i/s - 1.42x  slower
      ruby nil guard:   443185.2 i/s - 1.54x  slower
    with type safety:   382805.2 i/s - 1.79x  slower

The methods that aren’t Ruby or ruru are the FFI versions. Now you can see the difference for the slightest changes. With ruru, we’re able to match C’s performance without worrying about the risks associated with writing C code.

If a method isn’t being called much, then making these changes won’t likely register any difference in your overall benchmarks. But with methods that are excessively used, these changes do make a difference.

Another interesting factoid about benchmarking Rust versus C in Ruby is that the amount of cache your CPU has can affect the results for Rust. More cache will improve Rust’s performance over C.

This information is what has been observed between a few other developers and myself in the FasterPath project. We don’t have this data centrally cataloged yet but should have a system in place to do so in the future.

Summary

Ruru and Helix are not feature-complete systems. In ruru, I’ve observed integers, strings, and arrays working perfectly across the system as well as the init process for new objects from ruru to Ruby.

One area, as of this writing, that both ruru and Helix have yet to implement is allowing Ruby’s garbage collector to work on Ruby objects generated from the Rust side of code. The reason for this is likely that the VALUE property exists on the Rust side but the Ruby GC doesn’t know how to free it. I’ve observed this when calling Pathname.new from Rust on the directory entries for Pathname.entries, which leads to a segfault during benchmarks and not the test suite (enough to trigger the GC before exiting). The tracking issues for this are ruru#75 and helix#50.

Ruby has been a mature language for some time now, and Rust is still young and growing. It may be some time before ruru and Helix reach full 1.0 complete compatibility with Ruby. That all depends on the community growth and involvement.

So great things are coming in our future. In the meantime, we already have a great amount we can accomplish with what’s been created. I encourage you all to dabble with these powerful options. Please share what you’ve learned, document well for the sake of others and your future self, and someday soon, we’ll have younger developers more able to fully achieve and realize their goals in performance programming.

Published on Web Code Geeks with permission by Daniel P. Clark, partner at our WCG program. See the original article here: Improving Ruby Performance with Rust

Opinions expressed by Web Code Geeks contributors are their own.