blog.mhartl | Michael Hartl's tech blog

2008-06-26

Working around the validates_uniqueness_of bug in Ruby on Rails

Filed under: Ruby on Rails — mhartl @ 20:16

The useful validates_uniqueness_of validation in Active Record has a well-known flaw that bit us recently. (Apparently it wasn’t well-known enough. :-) Insoshi uses email addresses as unique logins, which naturally means we have a validation enforcing email uniqueness. And yet, until recently our database contained instances of people with the same email address. How can this be?

The answer is simple: despite its name, validates_uniqueness_of doesn’t actually guarantee uniqueness. Suppose that a new user registers using the address “foobar@example.com”. Rails performs the following steps:

  1. Check the database to see if there is already a person with email address “foobar@example.com”
  2. If not, insert the new record

The problem is when the following sequence occurs:

  1. HTTP request #1 tries to create a new record with email “foobar@example.com”, which Rails marks as valid
  2. HTTP request #2 tries to create a new record with email “foobar@example.com”, which Rails marks as valid
  3. The process from request #1 saves to the database
  4. The process from request #2 saves to the database

The result, contrary to the supposed “uniqueness validation”, is two records with the address “foobar@example.com”! Since the duplication happens silently, this can badly corrupt databases in some cases. (Luckily for us, our email verification uses a save! to save the person, which raises an exception for duplicate records.)

We’re not sure exactly how the problem arose—possibly from double-clicks on the registration button—but fixing it involves making changes at the database level. For the sake of illustration, consider a stripped-down Person model with only an email address and password. Here’s what the migration might look like:

class CreatePeople < ActiveRecord::Migration
  def self.up
    create_table :people do |t|
      t.column :email,    :string
      t.column :password, :string
      t.timestamps
    end
  end

  def self.down
    drop_table :people
  end
end

The validations look like this, including validates_uniqueness_of for the email attribute:

class Person < ActiveRecord::Base

  validates_presence_of     :email
  validates_uniqueness_of   :email
  validates_confirmation_of :password

  attr_accessible :email, :password, :password_confirmation
end

As we’ve seen, this doesn’t actually validate uniqueness in all cases. Let’s write an RSpec test to catch the problem. We’ll be enforcing email uniqueness at the database level, so we expect the database to raise an exception if we try to create two records with the same email address. As we’ll see once we add the constraint, the actual exception raised is of type ActiveRecord::StatementInvalid, so we’ll test for that:

require File.dirname(__FILE__) + '/../spec_helper'

describe Person do
  it "should prevent duplicate emails" do
    person = new_person
    person.save
    duplicate = new_person
    lambda do
      # Pass 'false' to 'save' in order to skip the validations.
      duplicate.save(false)
    end.should raise_error(ActiveRecord::StatementInvalid)
  end

  private

    def new_person(options = {})
      Person.new({ :email    => 'foobar@example.com',
                   :password => 'pass',
                   :password_confirmation => 'pass' }.merge(options)))
    end
end

The key here is the line that passes false to Active Record’s save method, which skips the validations (including validates_uniqueness_of). Initially, this test will fail:

$ spec spec/models/person_spec.rb

F

1)
'Person should prevent duplicate emails' FAILED
expected ActiveRecord::StatementInvalid but nothing was raised

Now we enforce data integrity by putting a unique index on the email field:

$ script/generate migration add_email_unique_index
class AddEmailUniqueIndex < ActiveRecord::Migration
  def self.up
    add_index :people, :email, :unique => true
  end

  def self.down
    remove_index :people, :email
  end
end

Now the test should pass:

$ rake db:migrate; rake db:test:prepare
$ spec spec/models/person_spec.rb
.

Finished in 0.138142 seconds

1 example, 0 failures

This catches the problem in the model, but the original problem—the (attempted) creation of an email duplicate—will still raise an exception in our application. The mere attempt to create a dupe isn’t by itself a problem, so it’s probably best simply to ignore the exception by catching it and redirecting somewhere sensible:

  def create
    @person = Person.new(params[:person])
    if @person.save
      self.current_person = @person
      redirect_to '/'
      flash[:notice] = "Thanks for signing up!"
    else
      flash[:error]  = "We couldn't set up that account, sorry."
      render :action => 'new'
    end
  rescue ActiveRecord::StatementInvalid
    redirect_to '/'
  end

With that, we’ve prevented duplication and handled any errors gracefully. Huzzah!

Postscript: If your database is infected with duplicate entries, there’s a quick way to find them using the console. Since people with non-unique email addresses are invalid, we can find and destroy them as follows:

$ script/console
>> duplicates = Person.find(:all).reject { |person| person.valid? }; 0
=> 0
>> duplicates.map(&:email)
=> ["foobar@example.com", "foobar@example.com", "bazquux@example.com", "bazquux@example.com"]
>> duplicate[0].destroy; duplicate[2].destroy

(The ; 0 in the second line just suppresses the (potentially long) list of Person objects.) If you have more duplicates than this, you might want to write a script or rake task to scrub your database, but using the console was sufficient to solve our problem at the Insoshi developer site.

7 Comments »

  1. [...] I recently ran into one such issue. [...]

    Pingback by Simple double form submission prevention with jQuery – The Pug Automatic — 2008-07-23 @ 06:08

  2. I don’t agree that directing somewhere like ‘/’ is sensible. That would mean users that click multiple times risk being sent out of the the registration flow (or whatever the form is for).

    Instead, this is what I’m doing:

    1. database-level uniqueness constraint to ensure data integrity

    2. JavaScript to mitigate the issue

    I should probably also catch ActiveRecord::StatementInvalid (if the user doesn’t have JS) and do something useful like redirect them to the next step of the registration process for the user with the details they just resubmitted.

    By the way, I’ve seen this issue from a user using IE7, but I can’t reproduce with Firefox 3. Possibly Firefox queues up the HTTP requests, waiting for one (the first submit) to complete before starting the next. So instead of an database level error, I would get a Rails-level validation error about the e-mail already being in use. My JS mitigates that as well.

    Comment by Henrik N — 2008-07-23 @ 06:18

  3. I think any Rails sites that deal with a lot of concurrency will have to deal with this issue at some point.

    Catching StatementInvalid alone will work if you violate an index, but it will also fire if somehow you have cocked up your SQL.

    Unfortunately, it seems each database has its own version of the error that occurs when you violate an index. But if you know you are going to use MySQL, you can do something like this:

    rescue ActiveRecord::StatementInvalid => error
    if error.to_s.include?(“Mysql::Error: Duplicate entry”)
    # raise your own exception

    Note that a more graceful way than re-directing would be to stash something in the errors collection. That’s how I do it anyway :)

    Comment by Robin Ward — 2008-07-23 @ 07:40

  4. @Henrik: ‘/’ is sensible in our app (Insoshi), but of course your case may be different. Thanks for the JavaScript tip; we may end up incorporating that at some point.

    @Robin Ward: Thanks for the note. We’re doing our best to be PostgreSQL-compatible (harder than it sounds), so we do want to avoid MySQL-specific error handling at this point. But you’re right, that comes at a cost, since the exception we catch is quite general. Regarding the error-stashing: we’ve found that most duplicates are inadvertent, so we prefer to ignore them silently, but your approach is certainly valid.

    Comment by mhartl — 2008-07-23 @ 13:28

  5. Hello! simply super resource

    Comment by vanicicsesode — 2008-12-19 @ 21:13

  6. Since this is a fundamental problem in Rails, would be nice if there were a standard way to catch this error. Trying to parse the error message makes sense, but is not very satisfactory.

    I don’t have a slam dunk solution, but one useful suggestion is to catch AR::StatementInvalid, and then in your rescue block check for a duplicate record in the database. If you find a duplicate record, chances are that’s the reason for the exception and you can handle it accordingly, whereas if you don’t find a duplicate, then the AR::StatementInvalid must be for some other reason and my inclination would be to reraise.

    Comment by Peter — 2009-04-6 @ 20:28

  7. I wrote a small AR::Base method to make it easier to use the method described above. I don’t have a blog, so I put it here: http://stackoverflow.com/questions/724431/how-can-i-determine-if-my-activerecord-object-violates-a-unique-database-key-inde

    Comment by Peter — 2009-04-7 @ 05:31


RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.