Advanced Enumeration with Ruby

Daniel P. ClarkAugust 3rd, 2017Last Updated: August 3rd, 2017

0 118 9 minutes read

Enumeration by definition is “the action of mentioning a number of things one by one.” In programming, instead of mentioning, we choose any action we may want to perform, whether it simply be printing out the item to a display or performing some sort of selection and/or transformation on the item.

In programming, we can perform many ways to select and process a collection at one time by chaining on each additional transformation in steps. And each step can either consume the entire collection before handing the results off to the next step, or it can be handled “lazily” and pass one or more items at a time through all the transformations.

How Ruby Does Enumeration

In this post, I’ll give a quick review on what blocks and yields do. The blocks we’re interested in for Ruby are sections of code defined within methods or procs/lambdas. You can think of yield as a place where a code block gets pasted into the current code block from elsewhere. Let me demonstrate.

def my_printer
  puts "Hello World!"
end

def thrice
  3.times do
    yield
  end
end

thrice &method(:my_printer)
# Hello World!
# Hello World!
# Hello World!

thrice { puts "Ruby" }
# Ruby
# Ruby
# Ruby

Methods accept two forms of blocks for yield: procs or blocks. The method method will transform a method definition into a proc, which then can be passed in as a block as above in the my_printer example.

Where yield is written above, it’s as if the code passed as a block were written in its place. So in the first case, simply imagine yield replaced with puts "Hello World!" and the second yield replaced with puts "Ruby".

yield can also work as a simple enumerator. You can pass any value in as a parameter to the block/proc by adding it after yield.

def simple_enum
  yield 4
  yield 3
  yield 2
  yield 1
  yield 0
end

simple_enum do |value|
  puts value
end
# 4
# 3
# 2
# 1
# 0

Minimum Enumerator Requirements

Ruby’s standard way of producing an enumerator is the each method, which yields values. With this, you can define an each method on any Ruby object and then take advantage of more than 50 methods for processing and evaluating collections from the Enumerable module. Simply add include Enumerable within the object that has a valid each method, and you can fully utilize all of those methods.

Enumerators aren’t limited to simple collections such as Array, but any collection that includes the each method (and will typically have the module Enumerable in its ancestors).

Array.ancestors
# => [Array, Enumerable, Object, Kernel, BasicObject]
Hash.ancestors
# => [Hash, Enumerable, Object, Kernel, BasicObject]
Hash.method_defined? :each
# => true
require "set"
Set.ancestors
# => [Set, Enumerable, Object, Kernel, BasicObject]

Lazy and Not Lazy Enumeration

Lazy enumeration is often considered a better way of processing a collection, as it will allow you to step through infinite sequences as far as you’d like to go.

Think of an assembly line of people to make a pizza where each person is responsible for only one step in the pizza’s transformation/creation. The first person tosses the dough into the right shape, the next person adds the sauce, the next the cheese, a person for each topping, one to put it in the oven, and the last person to deliver the ready pizza to you. In this example, Ruby’s lazy version of this is to have any number of orders of pizza, but everyone takes the time to do just the first pizza through every step of the process before continuing on to the next pizza to make.

If you don’t use lazy enumeration, then each step would have to wait for the entire collection to be done one step at a time. For example, if you have 20 orders of pizza, the person who tosses the pizza dough will have to do 20 of them before any of them get sauce added on by the next person. And each step in the line waits in a similar manner. Now, the bigger the collection you need to process, the more ridiculous it seems to make the rest of the assembly line wait.

A more real-world example would be processing emails to be sent out to all users. If there is an error in the code and it’s not being handled lazily, then it’s quite likely no one would have received an email. But in the case of lazy evaluation, you could potentially get most of your users emailed before an account information issue causes a problem. If a record is kept of successful emails sent, it’s easier to track down where the issue may lie.

Creating a lazy enumerator in Ruby is as simple as calling lazy on an object with Enumerable included in it or to_enum.lazy on an object with each defined on it.

class Thing
  def each
    yield "winning"
    yield "not winning"
  end
end

a = Thing.new.to_enum.lazy

Thing.include Enumerable
b = Thing.new.lazy

a.next
# => "winning"
b.next
# => "winning"

Calling to_enum returns an object that is both an Enumerator and an Enumerable object and will have access to all of their methods.

&nbsp
It is important to pay attention to which enumerable methods will consume the entire collection and which will work with lazy evaluation. For example, the partition method consumes the entire collection, so it’s unacceptable for infinite collections. Better options for lazy evaluation would be methods like chunk or select.

x = (0..Flot::INFINITY)

y = x.chunk(&:even?)
# => #<Enumerator::Lazy: #<Enumerator: #<Enumerator::Generator:0x0055eb840be350>:each>>
y.next
# => [true, [0]]
y.next
# => [false, [1]]
y.next
#=> [true, [2]]

z = x.lazy.select(&:even?)
# => #<Enumerator::Lazy: #<Enumerator::Lazy: 0..Infinity>:select>
z.next
# => 0
z.next
# => 2
z.next
# => 4

In the case of using select with an infinite sequence, you must first call the lazy method to prevent select from consuming the entire collection and the program halting for want of infinity.

Creating a Lazy Enumerator

Ruby has the Enumerator::Lazy class, which allows you to write your own lazy enumerator methods like Ruby’s take.

(0..Float::INFINITY).take(2)
# => [0, 1]

For a good example, we’ll implement FizzBuzz, which will start at any integer and allow infinite FizzBuzz results.

def divisible_by?(num)
  ->input{ (input % num).zero? }
end

def fizzbuzz_from(value)
  Enumerator::Lazy.new(value..Float::INFINITY) do |yielder, val|
    yielder << case val
    when divisible_by?(15)
      "FizzBuzz"
    when divisible_by?(3)
      "Fizz"
    when divisible_by?(5)
      "Buzz"
    else
      val
    end
  end end

x = fizzbuzz_from(7)
# => #<Enumerator::Lazy: 7..Infinity:each>

9.times { puts x.next }
# 7
# 8
# Fizz
# Buzz
# 11
# Fizz
# 13
# 14
# FizzBuzz

With Enumerator::Lazy, whatever you give to yielder will be the value that returns per each step in the progression. Enumerators do keep track of the current progress when using next. But if you call each after a few usages of next, it will start from the beginning of the collection.

The parameter you pass to Enumerator::Lazy.new is the collection that is to be enumerated over. If you wrote this method for Enumerable or a compatible object, you can simply place self as the parameter. val will be one value produced at a time from the collection’s each method and the yielder must be the one to receive input for any block of code you wish to pass to it, such as you would with each.

Advanced Enumerator Usages

When processing collections of data, it is recommended to put your limitation filters first in the chain of transformations you process. This way, it takes less work for the code to process the data. If you’re getting data from a database to process, have your limitation filters implemented in the database’s own language before Ruby if possible. That will likely be much more efficient.

require "prime"
x = (0..34).lazy.select(″.method(:prime?))
x.next
# => 2
x.next
# => 3
x.next
# => 5
x.next
# => 7
x.next
# => 11

After the select method above, you could have other methods appended to it to process the data. Those methods will only deal with the limited selection of data within prime numbers and not the rest.

Grouping

One nice way to process data for splitting into columns is to use group_by to convert the results into a hash of groups. After that, just retrieve the values, as that’s all we’re interested in.

[0,1,2,3,4,5,6,7,8].group_by.with_index {|_,index| index % 3 }.values
# => [[0, 3, 6], [1, 4, 7], [2, 5, 8]]

If you print the above results onto a web page, the data would be ordered as follows:

0    3    6
1    4    7
2    5    8

The group_by code above passes both a value and an index into the code block. We use an underscore for the value from the array to indicate we don’t care about that value and are only interested in the index. What gets returned by that is a hash with the keys of 0, 1, and 2 pointing to each of the groups of values we grouped. Since we don’t care about the keys, we call values on that hash to get the array of arrays to display as we please.

If we wanted to arrange the collection from left to right in columns, we could simply do this:

threes = (0..2).cycle
[0,1,2,3,4,5,6,7,8].slice_when { threes.next == 2 }.to_a
# => [[0, 1, 2], [3, 4, 5], [6, 7, 8]]

The threes enumerator simply cycles through 0 to 2 infinitely, in a lazy fashion. Which will then permit the display to be:

0    1    2
3    4    5
6    7    8

Ruby also has a transpose method, which will flip the above results from one to the other.

x = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
x = x.transpose
# => [[0, 3, 6], [1, 4, 7], [2, 5, 8]]
x = x.transpose
# => [[0, 1, 2], [3, 4, 5], [6, 7, 8]]

Folding

Let’s look at ways to compound a collection down to a result. In other languages, this is commonly done with a method named fold. In Ruby, it has long been done with reduce and inject. A more recent addition, and the preferred way to do this, is with each_with_object. The basic idea behind these is to process one collection into another as the result.

Summing a collection of integers is as simple as:

[1,2,3].reduce(:+)
# => 6

[1,2,3].inject(:+)
# => 6

class AddStore
  def add(num)
    @value = @value.to_i + num
  end

  def inspect
    @value
  end
end

[1,2,3].each_with_object(AddStore.new) {|val, memo| memo.add(val) }
# => 6

# As of Ruby 2.4
[1,2,3].sum
# => 6

each_with_object typically needs an object that can be updated. You can’t change an integer object from itself, which is why for this trivial example we created an AddStore object.

These methods will be better demonstrated by taking data from one collection and placing them into another. Note that inject and reduce are the same aliased method in Ruby and need to have the return value be what is at the end of the block for what the enumeration continues to build upon. each_with_object does not need the last piece of the code block to return the item to build on.

collection = [:a, 2, :p, :p, 6, 7, :l, :e]

collection.reduce("") { |memo, value|
  memo << value.to_s if value.is_a? Symbol
  memo # Note the return value needs to be the object/collection we're building
}
# => "apple"

collection.each_with_object("") { |value, memo|
  memo << value.to_s if value.is_a? Symbol
}
# => "apple"

Structs

Ruby struct objects are also enumerable objects, which can make for some convenient objects to write methods in.

class Pair < Struct.new(:first, :second)
  def same?;    inject(:eql?)  end
  def add;      inject(:+)     end
  def subtract; inject(:-)     end
  def multiply; inject(:*)     end
  def divide;   inject(:/)     end

  def swap!
    members.zip(entries.reverse) {|a,b| self[a] = b}
  end

end

x = Pair.new(23, 42)
x.same?
# => false

x.first
# => 23

x.swap!

x.first
# => 42

x.multiply
# => 966

Structs aren’t usually used for large collections but rather as useful data objects, a way to pass organized data together, which permits clear purpose with data rather than data clumps.

Data clumps are when two or more variables are always used in group and it wouldn’t make sense to use one of the variables by itself. This group of variables should be extracted into an object/class.

So structs in Ruby are generally small collections of data, but there isn’t anything to say that the data itself could be other collections of data. In which case, a struct could be a way to implement transformations over those collections, much like you could do with writing a class of your own.

Summary

Ruby’s pretty fantastic with how easy it is to work with and manage collections of data. Learning each piece of what Ruby has to offer allows you to write far more elegant code and to test and optimize for better implementations.

If performance is key, then benchmark alternative implementations and be sure to put your filters and limits as early into the process as you can. Consider limiting your input source into smaller chunks when you can, like using the readline method on files rather than read or readlines or LIMIT number in SQL.

Lazy iteration can help greatly with splitting tasks off for different threads or background jobs to handle. The concept of lazy iteration really has no downsides, as you could still choose to consume any entire collection at any point. It offers the greatest flexibility, and some languages, such as Rust with iterators, have made it their standard to be implemented lazily.

The possibilities are endless when it comes to how to manage and transform data sets. And it’s a fun process to learn and create each way of handling our data sets by programming. Ruby has well-documented examples for each of their enumerable methods, so it helps to learn from the examples given. I encourage you to experiment and discover many new things which will help make programming all the more enjoyable.

Reference:

Advanced Enumeration with Ruby from our WCG partner Daniel P. Clark at the Codeship Blog blog.