Maple Ong

Querying the Database Inside an Enumerator

Digging into a database timeout problem

While upgrading our Rails app to Ruby 3.1 last year, we found some tests were timing out on CI. We were able to eventually find out a minimum reproducible code:

User.transaction do
enum = Enumerator.new do |y|
y.yield User.first # stuck
end

enum.next
end

Performing database queries from inside an enumerator causes the program to hang. Interestingly enough, the hanging behaviour does not occur in Ruby versions prior to version 3.0.1.

Why Rails system tests needed a lock

There are a couple of things we have to understand to get the full picture of this problem. The first is that Rails uses Ruby's Monitor to lock threads to use the same database connection in system test.

In a unit test for Rails, Rails starts a database transaction and does the setup. At a object related test assertion (e.g. if the database contains expected object), the database is able to check for changes. Finally, the database rollback the transaction at the end of the test.

Back in the day, if you have a Rails system test, which include tests for front-end of the application, you would not be able to simultaneously test database transactions. This is because Puma (web server) and Capybara (acceptance test framework) executes in separate threads and in turn, uses separate connections to the database. Because of transaction isolation, neither threads can see the changes made to the database by the other.

In 2017, a change that assigns the same database connection to both threads was implemented (see: Ensure test threads share a DB connection). When a transaction is opened, the connection monitor is acquired by the current thread, this ensures that Puma and Capybara won't execute queries concurrently on the same connection, which would result to either errors or worse, a crash. This ensures the Puma and Capybara threads uses the same database connection.

A connection is represented as an AbstractAdapter in Rails and it uses Ruby's Monitor class as a lock access object. We use the monitor to synchronize between the Puma and Capybara threads to prevent a race condition. That means any modifications to the database in a system test will need to go through a lock.

Monitor is implemented using Fiber

It turns out that prior to Ruby 3.0.2, Monitors were owned by threads, when a thread acquired a Monitor, all the fibers of that thread could also enter it. However, Monitor is implemented using Fiber as of Ruby 3.0.2 to match the behaviour in Mutex (as described in this issue on Ruby).

Enumerator is also implemented using Fiber

Why are we talking about Monitors and Fibers, and why is it relevant to our original problem? To answer that question, we dug into how Enumerator was implemented in Ruby. I've written a semi-related blog post about it. What you need to know is that Enumerator also uses a Fiber in its implementation.

The real reason for the timeouts...

Going back to our example:

User.transaction do
enum = Enumerator.new do |y|
y.yield User.first # stuck
end

enum.next
end

Knowing that the Active Record connection pool uses a Monitor to manage database connections, and that Monitor is now implemented with a Fiber -- using Enumerator (that is implemented using Fiber) will cause a deadlock where a fiber tries to acquire a lock held by another fiber.

That was the cause of the hanging and database timeouts we were observing on our CI!

The solution

To solve this problem in Rails system tests, an alternative lock implementation like Monitor for Threads called ActiveSupport::Concurrency::ThreadLoadInterlockAwareMonitor was introduced. Now, the Active Record connection pool will select the appropriate monitor lock to use based on the type of lock thread. Hereby preventing any deadlocks from ever occurring (see: Make AbstractAdapter#lock thread local by default).

A big thank you to Jean Boussier for helping us dig into this problem and for proof reading this post.