Surrogate WebSockets Alongside Rails

Parker SelbertMarch 17th, 2016Last Updated: March 9th, 2016

0 66 8 minutes read

ActionCable is coming to Rails 5 and brings with it the promise of using WebSockets directly in Rails.

Ruby has a notoriously bad concurrency story, and that certainly extends into the realm of WebSockets and pubsub. ActionCable may be suitable for a small number of authenticated sessions, but scaling persistent connections to thousands or tens-of-thousands won’t be easy. That’s all right, the beauty of the internet is that you can mix and match technologies as you see fit!

There are plenty of other languages that your application can lean on to achieve real-time over WebSockets. In the recent past, you may have reached to Node.js for this kind of work, but there is a better option.

Pubsub is by its nature a distributed system, and it’s no secret that Erlang is a superstar in the distributed systems realm. Rather than reach for Erlang directly, we’ll leverage Elixir’s productive tooling to build out a WebSocket microservice.

By using a surrogate microservice, your Rails application remains the authority on business logic. It still handles all database interactions, sending email, and plugs away at the other work Rails excels at — it just doesn’t hold thousands of connections.

Gluing PubSub Together

You may recall from my previous article on service communication with pubsub that it is very easy to publish messages from Ruby. As a refresher, here is the simple Publisher class:

require 'redis'

class Publisher
  def publish(channel, message)
    Redis.current.publish(channel, message)
  end
end

The Rails application will publish score updates whenever data changes:

# from a controller or a background worker

publisher.publish('teams-123-scores', score.to_json)

Publishing is the easy part though! So long as you have a consistent naming scheme for channels and serialize your messages, it is very straightforward. What about the Elixir side?

Getting Started With Compatibility

In the Elixir world, Phoenix is an obvious choice for hosting surrogate WebSocket connections. There are two pubsub implementations broadly available for Phoenix: Redis and PG2 (using Erlang’s distributed named process groups).

As it turns out, both modules rely on Erlang encoded terms, so neither are conducive to communication outside the Erlang ecosystem. Developing an alternate pubsub module for Phoenix is beyond the scope of this article. Instead, we’ll build a simple system that allows Ruby and Elixir to communicate over pubsub with a minimal API.

Setting Up Elixir and Redis PubSub

From here on, I’ll assume you are familiar with Elixir, the mix build tool, and testing with ExUnit. Our first step is to create a surrogate project:

mix new surrogate && cd surrogate

Now we’ll install the Redis client. There are a few different Redis clients in the Erlang world, but we’ll be using the native Elixir client Redix. It is easy to configure, easy to test, and wonderfully written. Open the generated mix.exs and modify the deps:

defp deps do
  [{:redix, "~> 0.3"}]
end

Now retrieve the dependencies:

mix deps.get

That’s enough to get the project set up and explore pubsub with a small test. The only functions our module needs are subscribe/1, unsubscribe/1, and possibly broadcast/2. That makes it consistent with the Phoenix.PubSub API. Our test gets started by testing subscribe/1:

defmodule SurrogateTest do
  use ExUnit.Case

  alias Surrogate.PubSub

  test "subscribe/1 links to a topic" do
    {:ok, _pid} = PubSub.start_link

    :ok = PubSub.subscribe("topic.1")
    :ok = PubSub.subscribe("topic.2")

    {:ok, other_conn} = Redix.start_link
    {:ok, _} = Redix.command(other_conn, ~w(PUBLISH topic.1 hello))
    {:ok, _} = Redix.command(other_conn, ~w(PUBLISH topic.2 world))

    assert_receive "hello"
    assert_receive "world"
  end

The test first establishes a link to a new surrogate server and then uses the subscribe/1 API to register the test process for published messages. Next it starts another link directly through Redix and publishes messages on both of the subscribed topics. Finally, it verifies that both of the simple hello and world messages were received by the test process.

Actually defining the Surrogate.PubSub module is a bit more involved. Retaining a list of subscribed processes is essential, as every connection from the web is a separate process. In order to retain state, we must use an Agent or a GenServer. As our module combines state and functions to manipulate that state, we will use the GenServer behaviour.

defmodule Surrogate.PubSub do
  use GenServer

  def start_link(opts \\ []) do
    GenServer.start_link(__MODULE__, opts, [name: __MODULE__])
  end

  ## Callbacks

  def init(opts) do
    {:ok, conn} = Redix.PubSub.start_link(opts)

    {:ok, %{conn: conn, subs: %{}}}
  end
end

The GenServer behavior is mixed in via use, and we then define the start_link/1 convenience function. As soon as the process is started, it links itself to Redix.PubSub and retains a connection to Redis. Now we’re ready to add the subscribe/1 api method.

def subscribe(topic) do
  GenServer.call(__MODULE__, {:subscribe, topic})
end

## Callbacks

def handle_call({:subscribe, topic}, {pid, _}, %{conn: conn, subs: subs} = state) do
  :ok = Redix.PubSub.subscribe(conn, topic, self())

  tops = Map.get(subs, pid, MapSet.new)
  subs = Map.put(subs, pid, MapSet.put(tops, topic))

  {:reply, :ok, %{state | subs: subs}}
end

Subscribe issues a handle_call/3, part of the GenServer behaviour, to synchronously subscribe the caller process to a topic. Simultaneously it adds the calling process (seen here as {pid, _}) to the server’s state.

Every process is a key in the subs map, and its value is a set of all the topics the process is subscribed to. This mapping is critical for receiving published messages and managing unsubscribes later.

Running the tests now doesn’t quite work, because the Redix.PubSub module sends out-of-band messages back to the pubsub process, and they aren’t being handled. To handle these out-of-band messages, we have to use the handle_info/2 callback:

def handle_info({:redix_pubsub, :message, message, topic}, %{subs: subs} = state) do
  for {pid, topics} <- subs do
    if MapSet.member?(topics, topic), do: send pid, message
  end

  {:noreply, state}
end
def handle_info(_msg, state) do
  {:noreply, state}
end

There! Now the module can properly handle subscriptions and forwarding messages to the subscribed processes.

mix test test/surrogate_test.exs
Compiled lib/surrogate.ex
.

Finished in 0.1 seconds (0.06s on load, 0.04s on tests)
1 test, 0 failures

Randomized with seed 117850

What Else Does PubSub Need?

The subscribe/1 function is almost enough for one-way communication between the Rails app and the Surrogate. There isn’t any need to support broadcast/2 (which would send messages back upstream), leaving unsubscribe/1.

When the current process is unsubscribed, it should no longer receive any forwarded messages. The implementation of unsubscribe/1 is simple, effectively subscribe/1 in reverse. First, add a new test case:

test "unsubscribe/1 unlinks a topic" do
  {:ok, _pid} = PubSub.start_link

  :ok = PubSub.subscribe("topic.1")
  :ok = PubSub.unsubscribe("topic.1")

  {:ok, other_conn} = Redix.start_link
  {:ok, _} = Redix.command(other_conn, ~w(PUBLISH topic.1 hello))

  refute_receive "hello"
end

The test refutes that any message is broadcast after unsubscribing to "topic.1". Of course this initially fails — now we make it pass:

def unsubscribe(topic) do
  GenServer.call(__MODULE__, {:unsubscribe, topic})
end

## Callbacks

def handle_call({:unsubscribe, topic}, {pid, _}, %{subs: subs} = state) do
  tops = Map.get(subs, pid, MapSet.new)
  subs = Map.put(subs, pid, MapSet.delete(tops, topic))

  {:reply, :ok, %{state | subs: subs}}
end

Again we rely on the power of call, which includes the calling processes’ pid in the second argument. The handler finds the entry for the calling pid and removes the specified topic from its list. When future messages are published, that topic won’t be found and the message won’t be forwarded.

This approach is simple for illustration purposes, but it is quite naive and leaves behind dangling references to processes. Production grade implementations, such as Phoenix.PubSub, run periodic GC to clean up after unsubscribes.

Stitching into WebSockets

Our connections are built on top of Plug, which comes with an adapter for the Cowboy server. Every HTTP connection that comes to Cowboy is wrapped in a separate process. That includes WebSocket connections, which are initiated as HTTP connections and then upgraded to WebSockets.

That’s perfect for our pubsub setup! Once a connection is upgraded, the server simply waits for topic subscriptions and forwards them on to the Surrogate module.

Add both plug and cowboy to deps in mix.exs, ensure they are started as applications, and define Surrogate as the application to start.

def application do
  [mod: {Surrogate, []},
   applications: [:cowboy, :plug, :logger]]
end

defp deps do
  [{:redix, "~> 0.3"},
   {:plug, "~> 1.0"},
   {:cowboy, "~> 1.0"}]
end

In order to treat Surrogate as an application, it needs a base module that uses the Application behavior. Forgive me if what follows is a little like the how to draw an owl process. There are a lot of details in application setup that are outside the scope of this article, and it is necessary to get a real server.

defmodule Surrogate do
  use Application

  alias Plug.Adapters.Cowboy

  def start(_, _) do
    import Supervisor.Spec

    children = [
    Cowboy.child_spec(:http, Surrogate.Server, [], [dispatch: dispatch]),
    worker(Surrogate.PubSub, [])
    ]

    opts = [strategy: :one_for_one, name: Surrogate.Supervisor]

    Supervisor.start_link(children, opts)
  end

  defp dispatch do
    [{:_, [{"/ws", Surrogate.Socket, []},
        {:_, Cowboy.Handler, {Surrogate.Server, []}}]}]
  end
end

The Surrogate module is defined as a proper Erlang app, which supervises the pubsub and cowboy workers. The supervision ensures that any time a worker crashes a new one is started up in its place. Notice that the number of modules has expanded; now there are also Surrogate.Server and Surrogate.Socket modules. These will handle our incoming HTTP and WebSocket connections, respectively.

First, the Surrogate.Server module is defined as a simple plug that always returns an “OK” text response.

defmodule Surrogate.Server do
  import Plug.Conn

  def init(_opts) do
  end

  def call(conn, _opts) do
    conn
    |> put_resp_content_type("application/text")
    |> send_resp(200, "OK")
  end
end

We can verify that the server is working now by starting the application up with iex -S mix and visiting localhost:4000. If everything went according to plan, you’ll see a friendly “OK” in your browser. Now the final module needed to relay messages, Surrogate.Socket:

defmodule Surrogate.Socket do
  @behaviour :cowboy_websocket_handler

  def init(_, _req, _opts) do
    {:upgrade, :protocol, :cowboy_websocket}
  end

  ## Callbacks

  def websocket_init(_type, req, _opts) do
    {:ok, req, %{}, 60_000}
  end
end

The module uses the cowboy_websocket_handler behaviour, which requires callback handlers just like a GenServer. Upgrading to a WebSocket is handled by websocket_init/3, which is called whenever a new connection comes in to localhost:4000/ws. Once connected, we can begin sending and receiving messages through websocket_handle/3.

alias Surrogate.PubSub

def websocket_handle({:text, "ping"}, req, state) do
  {:reply, {:text, "pong"}, req, state}
end
def websocket_handle({:text, "subscribe|" <> topic}, req, state) do
  PubSub.subscribe(topic)

  {:ok, req, state}
end
def websocket_handle({:text, "unsubscribe|" <> topic}, req, state) do
  PubSub.unsubscribe(topic)

  {:ok, req, state}
end
def websocket_handle({:text, _}, req, state) do
  {:ok, req, state}
end

Each handler will pattern match on the incoming message, allowing the server to selectively handle subscribe and unsubscribe messages accordingly. At this point, the server can handle incoming messages, but it is unable to relay subscribed messages being returned from pubsub. As with all out-of-band messages to a server, those are handled with an _info/3 callback:

def websocket_info(message, req, state) do
  {:reply, {:text, message}, req, state}
end

Here the message is the exact value being published to the channel, and it is passed along to the WebSocket connection directly. Finally, we can test the round trip between Ruby, Elixir, and a web browser. Restart the surrogate server interactively with iex -S mix, reload the page, and open up the JavaScript console:

let ws = new WebSocket("ws://localhost:4000/ws")
ws.onmessage = (message) => console.log(message.data)
ws.send("subscribe|topic.1")

All messages published to topic. are fed to the console, whether they come from Ruby, a Redis CLI instance, or the Elixir shell. For example, when a ‘score’ object is serialized and published from Ruby, this is logged as data in the console:

'{"id":12345,"value":"new score"}'

Monolith This Isn’t

The system we’ve built is more complex than a monolith, but it will scale simply and perform reliably. It will easily handle thousands of topics and thousands of connections, even on the lightest weight hardware. Line for line, it may outweigh a simple naive Node.js implementation, but it is entirely fault tolerant right out of the box. That’s the benefit we get from building on top of Erlang.

This system is real-time only; there aren’t any mechanisms to rewind or catch up to missed messages after temporary disconnections. That is simply the nature of pubsub. Paired with a lack of channel authorization, it is perfectly suited to pushing updates over broadly available public channels. Think real-time sports updates, leaderboards, stock tickers, etc.

This is not the “beautiful monolith” approach, because that simply doesn’t exist when you enter the realm of distributed systems. It relies on an external database to broker communication, which is a service in itself. When external dependencies are already at play, it is wise to leverage languages and frameworks that are perfectly suited to the task.

Reference:

Surrogate WebSockets Alongside Rails from our WCG partner Florian Motlik at the Codeship Blog blog.