Most of my production work runs on Elixir and OTP. But before Phoenix, I shipped Ruby in production: Rails APIs, background workers, gem integrations. When I reach for a GenServer today, I am not thinking "Elixir magic." I am thinking: one worker, private state, messages in a mailbox, callers wait for a reply or fire-and-forget. That model is not owned by the BEAM. Ruby can express it too, if you pick the right primitives.
This post walks through a minimal token-bucket rate limiter twice: once with GenServer, once with concurrent-ruby's Concurrent::Async. The reference repos are published on GitHub (links below): same algorithm, same API shape, two runtimes.
Why a token bucket?
A rate limiter is the smallest interesting stateful server. It holds a token count and a last-refill timestamp. Callers ask synchronously: may I proceed? Operators can reset asynchronously. Twenty concurrent callers must never drain more than capacity, which is exactly the guarantee a serial message loop provides.
Elixir: GenServer
GenServer is OTP's generic server behaviour. You implement init/1, handle_call/3, and handle_cast/2. The runtime gives you a dedicated BEAM process, a mailbox, and strict serialisation of messages, so no mutex is required inside your callbacks.
defmodule RateLimiter do
@moduledoc """
Token-bucket rate limiter implemented as a GenServer.
One BEAM process owns the bucket state.
"""
use GenServer
def start_link(opts \ []) do
name = Keyword.get(opts, :name, __MODULE__)
GenServer.start_link(__MODULE__, opts, name: name)
end
def allow?(server, _key \ nil), do: GenServer.call(server, :allow?)
def stats(server), do: GenServer.call(server, :stats)
def reset(server), do: GenServer.cast(server, :reset)
@impl true
def init(opts) do
capacity = Keyword.get(opts, :capacity, 5)
refill_rate = Keyword.get(opts, :refill_rate, 1.0)
{:ok,
%{
capacity: capacity,
tokens: capacity * 1.0,
refill_rate: refill_rate,
last_refill_ms: System.monotonic_time(:millisecond)
}}
end
@impl true
def handle_call(:allow?, _from, state) do
state = refill(state)
if state.tokens >= 1.0 do
{:reply, true, %{state | tokens: state.tokens - 1.0}}
else
{:reply, false, state}
end
end
@impl true
def handle_call(:stats, _from, state) do
state = refill(state)
{:reply, Map.take(state, [:capacity, :tokens, :refill_rate]), state}
end
@impl true
def handle_cast(:reset, state) do
now = System.monotonic_time(:millisecond)
{:noreply, %{state | tokens: state.capacity * 1.0, last_refill_ms: now}}
end
defp refill(%{tokens: tokens, capacity: cap, refill_rate: rate, last_refill_ms: last} = state) do
now = System.monotonic_time(:millisecond)
elapsed_sec = (now - last) / 1000.0
new_tokens = min(cap * 1.0, tokens + elapsed_sec * rate)
%{state | tokens: new_tokens, last_refill_ms: now}
end
end{:ok, _pid} = RateLimiter.start_link(name: MyLimiter, capacity: 5, refill_rate: 2.0)
RateLimiter.allow?(MyLimiter, "user-123") # synchronous call: true | false
RateLimiter.stats(MyLimiter) # %{capacity: 5, tokens: 4.0, ...}
RateLimiter.reset(MyLimiter) # async cast: :okGenServer.call blocks the caller until handle_call returns {:reply, value, new_state}. GenServer.cast posts a message and returns immediately; the server handles it when it reaches the front of the mailbox. That is the Erlang gen_server contract, unchanged since the '90s.
Ruby: Concurrent::Async
The concurrent-ruby gem documents Async as loosely based on Erlang's gen_server, without supervision or linking. You include the module, call super() in initialize, and route work through await (synchronous) or async (fire-and-forget) proxies. Each object gets an executor thread; method calls are queued and processed one at a time. That is a mailbox.
# frozen_string_literal: true
require "concurrent-ruby"
class RateLimiter
include Concurrent::Async
def initialize(capacity: 5, refill_rate: 1.0)
super()
@capacity = capacity
@tokens = capacity.to_f
@refill_rate = refill_rate
@last_refill = monotonic_now
end
def allow?(_key = nil)
refill!
return false if @tokens < 1.0
@tokens -= 1.0
true
end
def stats
refill!
{ capacity: @capacity, tokens: @tokens, refill_rate: @refill_rate }
end
def reset
@tokens = @capacity.to_f
@last_refill = monotonic_now
:ok
end
def call_allow?(key = nil)
await.allow?(key).value
end
def cast_reset
async.reset
end
private
def refill!
now = monotonic_now
elapsed = now - @last_refill
@tokens = [@capacity.to_f, @tokens + (elapsed * @refill_rate)].min
@last_refill = now
end
def monotonic_now
Process.clock_gettime(Process::CLOCK_MONOTONIC)
end
endlimiter = RateLimiter.new(capacity: 5, refill_rate: 2.0)
limiter.call_allow?("user-123") # synchronous: blocks for reply
limiter.call_stats # read bucket without consuming
limiter.cast_reset # async cast: returns immediatelyOne Ruby detail worth calling out: await returns a Concurrent::IVar, not the bare value. The small call_allow? helper unwraps .value, the same moment a GenServer client unblocks with a reply. async.reset is the cast: enqueue and return.
Side-by-side mapping
- Elixir use GenServer -> Ruby include Concurrent::Async
- Elixir GenServer.start_link/1 -> Ruby RateLimiter.new (spawns actor thread via super)
- Elixir GenServer.call/2 (synchronous) -> Ruby limiter.await.method, then IVar.value
- Elixir GenServer.cast/2 (asynchronous) -> Ruby limiter.async.method
- Elixir handle_call/3 -> Ruby instance method invoked on actor thread
- Elixir handle_cast/2 -> Ruby instance method invoked on actor thread (no reply)
- Elixir process mailbox -> Ruby serialized method queue on executor thread
- Elixir isolated process heap -> Ruby thread + discipline (do not share mutable refs)
Threads, OS processes, and BEAM processes
This is where comparisons often go wrong: conflating three different things because all three are called "process" in different communities.
The chart above is illustrative, not a benchmark of our rate-limiter repos. It shows orders of magnitude: BEAM processes stay tiny per worker, Ruby threads carry a heavier per-thread cost, and OS processes buy isolation with RAM and spawn time. That is the trade space you are navigating when you pick GenServer vs Concurrent::Async vs Puma workers.
BEAM process (Elixir)
- Not an OS process. Thousands fit in one OS process.
- Isolated heap and garbage collection per process: a crash does not corrupt neighbours.
- Preemptive scheduling across many schedulers (one per core by default).
- Communication only via copying messages (for large binaries, ref-counted, but the discipline is still message passing).
Ruby Thread (MRI)
- OS thread, but the Global VM Lock (GIL) means only one thread executes Ruby bytecode at a time.
- Excellent for I/O-bound actors (network, disk, sleep), exactly what a rate limiter does.
- Poor choice for CPU-heavy parallel Ruby on many cores; consider Process.fork, a process pool, or JRuby/TruffleRuby.
- Shared memory model: if you pass a mutable Hash into an actor and mutate it elsewhere, you have a data race. GenServer makes this hard to do by accident; Ruby makes it easy.
OS process (Ruby Process.spawn / fork)
- True isolation like the BEAM: separate memory, separate GIL.
- Heavyweight: slower spawn, higher RAM, harder IPC (pipes, Redis, DB).
- Common pattern in MRI for CPU parallelism (e.g. Puma workers, Sidekiq processes).
- Complementary to Concurrent::Async: actors inside a worker, processes across workers.
The mental model transfers. The fault-tolerance guarantees do not. OTP supervision (restart a crashed GenServer with a strategy, let it take down a subtree) has no first-class equivalent in concurrent-ruby. That is the honest ceiling on "Ruby can do GenServer." It can do the messaging pattern. It cannot do the reliability layer without you building it.
Proving concurrency safety
Both repos include the same test: twenty concurrent callers, capacity three, so exactly three should succeed. Elixir uses Task.async_stream; Ruby uses Thread.new. Same assertion, different scheduler.
test "concurrent callers never exceed capacity", %{name: name} do
results =
1..20
|> Task.async_stream(fn _ -> RateLimiter.allow?(name) end, max_concurrency: 20)
|> Enum.map(fn {:ok, allowed?} -> allowed? end)
assert Enum.count(results, & &1) == 3
enddef test_concurrent_callers_never_exceed_capacity
limiter = RateLimiter.new(capacity: 3, refill_rate: 1.0)
results = Array.new(20) do
Thread.new { limiter.call_allow? }
end.map(&:value)
assert_equal 3, results.count(true)
endWhat is genuinely the same
- "Do not communicate by sharing memory; share memory by communicating."
- One serial worker owns mutable state, with no locks inside the bucket logic.
- Synchronous vs asynchronous client APIs map cleanly (call vs cast).
- The algorithm (token refill, capacity ceiling) is identical line for line in spirit.
What is different (and matters in production)
- Supervision and restart strategies: OTP native; Ruby DIY.
- Process isolation: BEAM per-actor GC; Ruby shared VM.
- Back-pressure and observability: Telemetry, OTP releases, :sys.get_status/1 vs logging and custom metrics.
- Distribution: Node clustering is built in; Ruby typically needs Redis, Kafka, or gRPC.
- concurrent-ruby-edge adds ErlangActor and Channel for closer BEAM semantics, but they are edge APIs. Async in the main gem is the pragmatic GenServer-shaped choice.
Run the proof-of-concepts
git clone https://github.com/ijunaid8989/rate-limiter-elixir.git
cd rate-limiter-elixir
mix testgit clone https://github.com/ijunaid8989/rate-limiter-ruby.git
cd rate-limiter-ruby
bundle install
bundle exec ruby test/test_rate_limiter.rb
ruby bin/demoClosing thought
I reach for Elixir when I want the pattern and the platform guarantees together: lightweight processes, supervision, hot code upgrades in the right deployment. I reach for Ruby when the ecosystem, team, or integration surface demands it, and I still model concurrent state as actors, not shared mutable singletons. Knowing both runtimes means choosing the guarantee you actually need, not the syntax you used last week.
Reference implementations: rate-limiter-elixir and rate-limiter-ruby on GitHub. Pair them, star them, cite them in your next architecture review when someone says "we can't do that, we're on Ruby."