A yak shave turned good: Switching from Poltergeist to Headless Chrome for Capybara browser tests

I just finished up migrating all the Capybara feature tests in my Rails/React app from Poltergeist to Headless Chrome. I ran into a few issues I didn’t see covered in other write-ups, so I thought I’d pull together what I learned and what I found useful.

This pull request shows all the changes I had to make.

Motivation

Poltergeist is based on the no-longer-maintained PhantomJS headless browser. When we started using PhantomJS a few years ago, it was the only good way to run headless browser tests, but today there’s a good alternative in Selenium with Chrome’s headless mode. (Firefox has a headless mode now as well.) I briefly tried out a switch to headless Chrome a while ago, but I ran into too many problems early on and gave up.

This time, I decided to try it again after running into a weird bug — which I still don’t know the cause of. (Skip on down if you just want the migration guide.) This was a classic yak shave…

I got a request to add support for more langauges for one a particular feature, plots of the distribution of Wikipedia article quality before and after being worked on in class assignments. The graphs were generated with R and ggplot2, but this was really unreliable on the international version of the app. To get it working more reliably, I decided to try to reimplement the graphs client-side, using Vega.js. The type of graph I needed — kernel density estimation — was only available in a newer version of Vega, so first I needed to update all the other Vega plots in the app to work on the latest version of Vega, which had a lot of breaking changes from the version we’d been on. That was tough, but along the way I made some nice improvements to the other plots along with a great-looking implementation of the kernel density plot. But as I switched over to the minified versions of the Vega libraries and got ready to call it done, all of a sudden my feature specs were all failing. They passed fine with the non-minified versions, but the exact same assets in minified form — the distributed versions, straight from a CDN — caused Javascript errors that I couldn’t replicate in a normal browser. My best guess is that there’s some buggy interaction between Vega’s version of UglifyJS and the JS runtime in PhantomJS, which is triggered by something in Vega. In any case, after failing to find any other fixes, it seemed like the right time to take another shot at the Poltergeist → headless Chrome migration — which I’m happy to say, worked out nicely. So, after what started as an i18n support request, I’m happy to report that my app no longer relies on R (or rinruby to integrate between R and Ruby) and my feature tests are all running more smoothly and with fewer random failures on headless Chrome.

😀

Using R in production was fun and interesting, but I definitely won’t be doing it again any time soon.

If you want to see that Vega plot that started it all, this is a good place to look. Just click ‘Change in Structural Completeness’. (Special thanks to Amit Joki, who added the interactive controls.)

Setting up Capybara

The basic setup is pretty simple: put selenium-webdriver and chromedriver-helper in the Gemfile, and then register the driver in the test setup file. For me it looked like this:

Capybara.register_driver :selenium do |app|
  options = Selenium::WebDriver::Chrome::Options.new(
    args: %w[headless no-sandbox disable-gpu --window-size=1024,1024]
  )
  Capybara::Selenium::Driver.new(app, browser: :chrome, options: options)
end

Adding or removing the headless option makes it easy to switch between modes, so you can pop up a real browser to watch your tests run when you need to debug something.

Adding the chrome: stable addon in .travis.yml got it working on CI as well.

Dealing with the differences between Poltergeist and Selenium Chromedriver

You can run Capybara feature tests with a handful of different drivers, and the core features of the framework will work with any driver. But around the edges, there are some pretty big differences in behavior and capabilities between them. For Poltergeist vs. Selenium and Chrome, these are the main ones that I had to address during the migration:

More accurate rendering in Chrome

PhantomJS has some significant holes in CSS support, which is especially a problem when it comes to misrendering elements as overlapping when they should not be. Chrome does much more accurate rendering, closely matching what you’ll see using Chrome normally. Relatedly, Poltergeist implements .trigger('click'), which unlike the normal Capybara .click , can work even if the target element is underneath another one. A common error message with Poltergeist points you to try .trigger('click') when the normal .click fails, and I had to swap a lot of those back to .click.

Working with forms and inputs

The biggest problem I hit was interacting with date fields. In Poltergeist, I was using text input to set date fields, and this worked fine. Things started blowing up in Chrome, and it took me a while to figure out that I needed provide Capybara with Date objects instead of strings to make it work. Capybara maintainer Thomas Walpole (who is incredibly helpful) explained it to me:

fill_in with a string will send those keystrokes to the input — that works fine with poltergeist because it doesn’t actually support date inputs so they’re treated as standard text inputs so the with parameter is just entered into the field – Chrome however supports date inputs with it’s own UI.
By passing a date object, Capybara’s selenium driver will use JS to correctly set the date to the input across all locales the browser is run in. If you do want to send keystrokes to the input you’d need to send the keystrokes a user would have to type on the keyboard in the locale the browser is running in —In US English locale that would mean fill_in(‘campaign_start’, with: ’01/10/2016’)

Chromedriver is also pickier about which elements you can send input to. In particular, it must be focusable. With some trial and error, I was able to find focusable elements for all the inputs I was interacting with.

Handling Javascript errors

The biggest shortcoming with Selenium + Chrome is the lack of support for the js_errors: true option. With this option, a test will fail on any Javascript error that shows up in the console, even if the page is otherwise meeting the test requirements. Using that option was one of the main reasons we switched to Poltergeist in the first place, and it’s been extremely useful in preventing bugs and regressions in the React frontend.

Fortunately, there’s a fairly easy way to hack this feature back in with Chrome, as suggested by Alessandro Rodi. I modified Rodi’s version a bit, adding in the option to disable the error catching on individual tests — since a few of my tests involve testing error behavior. Here’s what it looks like, in my rails_helper.rb:

  # fail on javascript errors in feature specs
  config.after(:each, type: :feature, js: true) do |example|
    errors = page.driver.browser.manage.logs.get(:browser)
    # pass `js_error_expected: true` to skip JS error checking
    next if example.metadata[:js_error_expected]

    if errors.present?
      aggregate_failures 'javascript errrors' do
        errors.each do |error|
          # some specs test behavior for 4xx responses and other errors.
          # Don't fail on these.
          next if error.message =~ /Failed to load resource/

          expect(error.level).not_to eq('SEVERE'), error.message
          next unless error.level == 'WARNING'
          STDERR.puts 'WARN: javascript warning'
          STDERR.puts error.message
        end
      end
    end
  end

Different behavior using matchers to interact with specific parts of the page

Much of the Capybara API is driven with HTML/CSS selectors for isolating the part of the page you want to interact with.

I found a number of cases where these behaved differently between drivers, most often in the form of Chrome reporting an ambiguous selector that matches multiple elements when the same selector worked fine with Poltergeist. These were mostly cases where it was easy enough to write a more precise selector to get the intended element.

In a few cases with some of the intermittently failing specs, Selenium + Chrome also provided more precise and detailed error messages when a target element couldn’t be found — giving me enough information to fix the badly-specified selectors that were causing the occasional failures.

Blocking external urls

With Poltergeist, you can use the url_blacklist option to prevent loading specific domains. That’s not available with Chromedriver. We were using it just to reduce unnecessary network traffic and speed things up a bit, so I didn’t bother trying out the alternatives, the most popular of which seems to be to use Webmock to mock responses from the domains you want to block.

Status code

In Poltergeist, you can easily see what the HTTP status code for a webpage is: page.status_code. This feature is missing altogether in Selenium. I read about a few convoluted ways to get the status code, but for my test suite I decided to just do without explicit tests of status codes.

Other useful migration guides and resources

There are a bunch of blog posts and forum threads on this topic, but the two I found really useful are:

* https://about.gitlab.com/2017/12/19/moving-to-headless-chrome/
* https://github.com/teamcapybara/capybara/issues/1860

I am 💯 and you can too.

100% test coverage! Catch all the bugs!

I’ve been working on a Ruby on Rails app for more than three years, and Ruby coverage for its rspec test suite has been at 100% for most of the last year. 😀

People with more experience may try to tell you this is a bad idea. Test what you need to test. 90% is a good rule of thumb. If you’re doing TDD right, you’ll be testing based on needs, and coverage will take care of itself.

No! Do it!

Why?

  • You’ll learn a lot about Rails, your gem dependencies, and your test tools.
  • It’ll help you write better tests and more testable code.
  • You will find bugs you didn’t know about!
  • It’s fun! Getting those tricky bits tested is a good puzzle.
  • Once you get there, it’s easy to maintain.

Okay, but how?

Lies!

You may have heard people say that code coverage is a lie. It’s true. For example, there are some tricksy ways to add specs for rake tasks to your test suite, but because rake tasks start from a different environment, they don’t integrate cleanly with simplecov. That’s why we exclude rake tasks from the coverage metrics in my app, even though we do test them.

Sadly, even once you reach 💯, it may be tough to keep it in the long run. Ruby 2.5 introduced a big change in the code coverage API. With that change, it has become possible for code coverage tools to report Branch Coverage rather than just line coverage, so the tricks with conditional modifiers and ternary operator may still not get you to 100% in the future. (The deep-cover gem also does this, without requiring Ruby 2.5.) But don’t fret; you’ll probably be able to do something like pin your coverage gem to the last version that just uses line coverage. A small price to pay for the peace of mind of knowing your app is 100% bug free. Or maybe it’s just a number to brag about. Either way, worth it!

 

Obviously this is not the case. But I still recommend trying to get to 100% coverage, for the reasons listed above.

First impressions of Ruby branch coverage with DeepCover

Branch coverage for Ruby is finally on the horizon! The built-in coverage library is expected to ship in Ruby 2.5 with branch and method coverage options.

And a pure-Ruby gem is in development, too: DeepCover.

I gave DeepCover a try with my main project, the Wiki Education Dashboard, and the coverage of the Ruby portions of the app dropped from 100% to 96.75%. Most of that is just one-line guard clauses like this:

return unless Features.wiki_ed?

But there are a few really useful things that DeepCover revealed. First off, unused scopes on Rails ActiveRecord models:

<br />
class Revision &lt; ActiveRecord::Base<br />
  scope :after_date, -&gt;(date) { where('date &gt; ?', date) }<br />
end<br />

Unlike default line coverage, DeepCover showed that this lambda wasn’t ever actually used. It was dead code that I removed today.

Similarly, DeepCover shows where default arguments in a method definition are never used, like this:

<br />
class Article &lt; ActiveRecord::Base<br />
  def update(data={}, save=true)<br />
    self.attributes = data<br />
    self.save if save<br />
  end<br />
end<br />

This method was overriding the ActiveRecord update method, adding an option to update without saving. But we no longer use that second argument anywhere in the codebase, meaning I could delete the whole method and rely on the standard version of update.