blog.mhartl | Michael Hartl’s (mostly Ruby on Rails) tech blog

2009-05-15

Running rcov with RSpec

Filed under: RSpec, Ruby, Ruby on Rails — mhartl @ 16:01

I recently wanted to run rcov, the Ruby code coverage tool, on a project tested with RSpec. I think I’d done it once before, but I’d forgotten how. After searching to no avail (both the rcov home page and the RSpec page on rcov proved unhelpful), I applied the tried-and-true Wild-Assed Guess™ method and typed

$ rake spec:rcov

That worked.

The reports themselves are in the coverage/ directory:

$ open coverage/index.html

(or navigate your browser to file:///path/to/project/coverage/index.html). If you’re using Git, add coverage/* to your .gitignore file.

N.B. The inverse Rake task also exists:

$ rake -T rcov
rake spec:clobber_rcov  # Remove rcov products for rcov
rake spec:rcov          # Run all specs in spec directory with RCov (excluding plugin specs)

2009-04-28

New RSS feed

Filed under: Uncategorized — mhartl @ 08:22

This blog’s RSS feed has changed; please re-subscribe here: http://feeds2.feedburner.com/mhartl. (It might take an hour or two to go live. If it doesn’t work for you now, come back in a bit and try again.)

2008-10-28

Using a temporary branch when doing Git merges

Filed under: Git, Insoshi — mhartl @ 15:54

Merging branches in Git is wonderfully easy compared to many other version control tools, but sometimes merging causes problems you’d rather undo. One common merge side effect is the creation of code conflicts, and sometimes a merge causes so many conflicts that you end up regretting doing the merge in the first place. In addition, for projects with many contributors (such as Insoshi), sometimes you aren’t sure if you will even want to use the contribution on the branch you’re merging in. Unfortunately, merges are very difficult to undo, so if you just do a direct merge of, say, a contributor branch into your main development branch, you’re stuck if you decide you don’t want the changes after all:

# Don't do this!
$ git checkout master
$ git merge contributor_branch

The solution is always to use a temporary branch when doing any merge whose changes you’re not sure you’ll want to keep; if the merge proves intractable due to conflicts, or you just don’t want to use the contribution, then you can simply delete the temp branch. Here’s how it works:

$ git checkout master
$ git checkout -b temp_branch
$ git merge contributor_branch

Then you can do stuff like

$ git status
$ git diff master
<resolve conflicts, polish contributed code>

If the new branch passes muster, you can then merge it in:

$ git checkout master
$ git merge temp_branch
$ git branch -D temp_branch

(Note here that I’ve deleted the temp branch in the final step, just to clean up.) If, on the other hand, you decide not to continue with the merge, you can just delete the temp branch without merging it in:

$ git checkout master
$ git branch -D temp_branch

Either way, the lesson is the same: consistently using temp branches when doing dangerous merges is a great way to avoid the agony of merge remorse.

2008-10-14

Setting up your Git repositories for open source projects at GitHub

Filed under: Git, Insoshi — long @ 12:08

[This is a guest post from Long Nguyen. —mhartl]

Like a lot of projects in the Ruby on Rails world, the Insoshi social networking platform uses Git and GitHub to manage its open source development and contributions. In setting up the repositories for Insoshi, I’ve applied the version control experience I gained at Discover, where I was technical lead for the software configuration management (SCM) team. Since some aspects of our setup aren’t obvious if you haven’t managed large projects before, we at Insoshi decided to share the details so that other GitHub projects might benefit as well.

We’ll start by reviewing the typical Git workflow based on pull requests, then discuss some problems you might run into with a “typical” repository setup, and finally explain the details of preparing the Insoshi Git repository for collaboration.

Why Pull Requests?

Git was originally developed by Linus Torvalds to host the Linux kernel, and pull requests are the de-facto standard for submitting contributions in Git because that’s what Linus does. (He talked about this in his Google Tech Talk on Git.) The concept of the pull request is straightforward: You notify someone that you’ve made an update via email, messaging on GitHub, etc. and let them know where to find it. They can then pull in your changes and merge it with their work.

Except for that interaction, everyone works within their own repository and on their own schedule. There’s no process waiting to be completed that blocks you from moving on to whatever you need/want to do next. And you’re not forcing anyone to drop what they’re doing to right now to handle your request.

It’s all very polite. And it works well in the context of distributed development since you avoid all kinds of coordination issues.

What’s needed?

If you want to contribute to an open source project, here’s really all that you need:

  1. A publicly accessible repository where your changes can be found
  2. A local repository for your development

Even if you’re new to Git, these both seem like pretty straightforward things to do—especially if you’re using GitHub for the public repository: your repository is just a fork of the main project repository.

Let’s set up our repository by going to the official Insoshi repository and clicking on the fork button:

I’ll need make note of the public clone URL for the official repository and my private clone URL for my newly created fork:

  • Official Insoshi public clone URL
    git://github.com/insoshi/insoshi.git
  • My fork’s private clone URL
    git@github.com:long/insoshi.git

Your local repository: The “obvious” thing to do

At this point, I’ll be tempted to go ahead and make a local clone of my fork:

$ git clone git@github.com:long/insoshi.git

and immediately get to work.

Technically, there’s nothing wrong with that. And as an individual developer starting a new project, it’s what you do, but there are several disadvantages to this seemingly straightforward approach. One of the major benefits of a distributed version control system like Git is that each repository is on an equal footing; in particular, we would like every fork to have the same master branch, so that if the “official” Insoshi repository should ever be lost there would be plenty of redundant backups. We also want it to be easy for each developer to pull in changes from the official repository; the “obvious” approach isn’t set up for that. Finally, it’s a bad idea in general to work on the master branch; experienced Git users typically work on separate development branches and then merge those branches into master when they’re done.

What we’d like is a way to connect up the local repository in a way that will

  • Keep the repositories in sync so that each contains the full “official” repository
  • Allow developers to pull in official updates
  • Encourage working on branches other than master

In the “obvious” configuration, I’m not set up to do any of that:

  • There’s no local connection to the official repository for updates
  • There’s no mechanism in place to push official updates to my fork on GitHub
  • We’re working directly on the master branch

Your local repository: The “right” way

Keeping the big picture in mind, here are the commands I’ve run to set up my local repository (using the GitHub id long):

$ git clone git://github.com/insoshi/insoshi.git
$ cd insoshi
$ git branch --track edge origin/edge
$ git branch long edge
$ git checkout long
$ git remote add long git@github.com:long/insoshi.git
$ git fetch long
$ git push long long:refs/heads/long
$ git config branch.long.remote long
$ git config branch.long.merge refs/heads/long

Let’s take a detailed look at what these steps accomplish.

So what does it all mean?

Step one

Create a local clone of the Insoshi repository:

$ git clone git://github.com/insoshi/insoshi.git

You should note that the Git URL for the clone references the official Insoshi repository and not the URL of my own fork (i.e., the clone URL is git://github.com/insoshi/insoshi.git instead of git@github.com:long/insoshi.git). This way, the official repository is the default remote (aka ‘origin’), and the local master branch tracks the official master.

Step two

I have to change into the repository to perform additional git setup:

$ cd insoshi
Step three

Insoshi also has an ‘edge’ branch for changes that we want to make public but may require a bit more polishing before we’d consider them production-ready (in the past this has included migrating to Rails 2.1 and Sphinx/Ultrasphinx).  Our typical development lifecycle looks something like

development -> edge -> master

I want to create a local tracking branch for it:

$ git branch --track edge origin/edge
Steps four and five

As I mentioned before, I’m resisting the temptation to immediately start working on the local ‘master’ and ‘edge’ branches. I want to keep those in sync with the official Insoshi repository.

I’ll keep my changes separate by creating a new branch ‘long’ that’s based off edge and checking it out:

$ git branch long edge
$ git checkout long

By the way, you can actually combine the two commands if you like, using just the ‘git checkout’ command with the -b flag:

$ git checkout -b long edge

You can name this branch anything that you want, but I’ve chosen my GitHub id so that it’s easy to identify.

I’m starting my changes off of ‘edge’ since that contains all the latest updates and any contribution I submit a pull request for will be merged first into the official Insoshi ‘edge’ branch to allow for public testing before it’s merged into the ‘master’.

Steps six and seven

I’m finally adding the remote reference to my fork on GitHub:

$ git remote add long git@github.com:long/insoshi.git

I’ve used my GitHub id once again, this time as the remote nickname.

We should run a fetch immediately in order to sync up the local repository with the fork:

$ git fetch long
Step eight

I’m pushing up my new local branch up to my fork. Since it’ll be a new branch on the remote end, I need to fully specify the remote refspec:

$ git push long long:refs/heads/long
Steps nine and ten

Now that the new branch is up on my fork, I want to set the branch configuration to track it:

$ git config branch.long.remote long
$ git config branch.long.merge refs/heads/long

Setting the remote lets me just simply use

$ git push

to push changes on my development branch up to my fork

Setting the merge configuration is mainly for completeness at this point. But if you end up working on more than one machine (work/home, desktop/laptop, etc.), it’ll allow you to just use

$ git pull

to grab the changes you’ve pushed up to your fork.

Isn’t that a lot of extra work to do?

This may seem like a lot work up front, but it’s all configuration work that you’d eventually do anyway. If you’re really that concerned about the extra typing, I’ve got a shell script for you.

The extra work is worth the effort, because with this configuration

  • My changes will be easily identifiable in my named branch
  • I can easily get updates from the main Insoshi repository
  • Any updates I’ve pulled into master and edge are automatically pushed up to my fork on GitHub

The last one is a bonus because the default refspec for remotes is refs/heads/*:refs/heads/*. This means that the simple ‘git push’ command will push up changes for all local branches that have a matching branch on the remote. And if I make it a point to pull in updates to my local master and edge but not work directly on them, my fork will match up with the official repository.

So what is the benefit of all this to open source projects like Insoshi?

  • The easier it is for the contributor to pull in updates, the more likely it will be that the pull request will be for code that merges easily with the latest releases (with few conflicts)
  • You can tell if someone is pulling updates by looking at their master and edge branches and seeing if they match up with the latest branches on the main repository
  • By getting contributors in the habit of working on branches, you’re going to get better organized code contributions

Basically, the less effort that’s required to bring in code via a pull request, the sooner it can be added to the project release. And at the end of the day, that’s really what it’s all about.

Putting (pushing and pulling) it all together

Now that we’ve covered all the details, let’s go through the full set of steps needed to make a contribution to a project like Insoshi:

  1. Fork the Insoshi repository on GitHub:

  2. Follow the Git steps above or use the shell script to set up your local repository
  3. Checkout the local branch, just to be sure:
    $ git checkout long
  4. Make some changes (and remember your development branch is against ‘edge’) and commit them:
    [make changes in a text editor]
    $ git commit -m "My great contribution"
    $ git push
  5. Go to your fork and branch at GitHub (I’m at long/insoshi @ long) and click on the pull request button:

  6. Tell us about what you just did and make sure “insoshi” is a recipient:
  7. Bask in the glory of being an open-source contributor!

2008-09-26

Using Rails to serve different content to humans and robots

Filed under: Insoshi, Ruby on Rails — mhartl @ 11:38

This post answers the question, How do you use Rails to do one thing for robots, and another thing for humans?

Why would you want to do this? In our case, the Insoshi home page forwards to a portal page that uses frames in order to have an interface that unifies the sites on the insoshi.com domain with those off-site, such as our GitHub repository and bug tracker. The frames page is horribly search-unfriendly, though, so we serve bots the actual content of http://insoshi.com/home/index (our routes map / to /home/index). (Note: The front of the portal page as seen by a human is the same as the index page served to bots; be careful about doing anything else, since bots can punish you if you use this technique for anything slimy.)

Our method is to use a before filter in the Home controller. Here’s the code (minus some irrelevant bits):


class HomeController < ApplicationController
  before_filter :forward_nonbots_to_portal, :only => "index"

  .
  .
  .

  private

    # Return true if the user agent is a bot.
    def robot?
      bot = /(Baidu|bot|Google|SiteUptime|Slurp|WordPress|ZIBB|ZyBorg)/i
      request.user_agent =~ bot
    end

    # Allow an explicit override of the forward_nonbots_to_portal.
    def no_redirect?
      params[:redirect] == 'false' or RAILS_ENV['ENV'] != 'production'
    end

    def forward_nonbots_to_portal
      redirect_to "http://portal.insoshi.com" unless robot? or no_redirect?
    end
end

The key here is the robot? method, which has a regex with a list of the most common bots user agents:


    # Return true if the user agent is a bot.
    def robot?
      bot = /(Baidu|bot|Google|SiteUptime|Slurp|WordPress|ZIBB|ZyBorg)/i
      request.user_agent =~ bot
    end

If the user isn’t a bot, we redirect them to the portal.

N.B. The second boolean, no_redirect?, prevents the forwarding in development mode and also allows us to override the redirect by passing a redirect=false parameter. This latter condition allows us to link directly to the home page, without a redirect, by using http://insoshi.com/?redirect=false. In particular, the portal menu home link itself uses this URL, because otherwise clicking on the home link repeatedly would cause a bunch of nested portal pages to appear.

2008-09-21

Finding and fixing mass assignment problems in Rails applications

Filed under: Insoshi, Ruby on Rails, mass assignment — mhartl @ 18:14

Last week I received an email from Eric Chapweske (of Slantwise Design and the Rail Spikes blog) alerting me to mass assignment vulnerabilities in the Insoshi social network sourcecode. (See my post on mass assignment for a quick review of the concept, and don’t miss Eric’s mass assignment article for a more thorough treatment.) I quickly set to work fixing the problems, and within a few hours of receiving the email I’d pushed out a patched version to the Insoshi GitHub repository. Since the process was so instructive, and since mass assignment vulnerabilities are so common, I thought I’d share some of the details of what it took to fix them.

Fixing the models and controllers

The first step in solving mass assignment problems is to find them, so I whipped up a little find_mass_assignment plugin to make it easier:

$ script/plugin install git://github.com/mhartl/find_mass_assignment.git

(You’ll need Git and Rails 2.1 or later for this to work.)

This defines a Rake task to find mass assignment vulnerabilities. (It works by searching through the controllers for likely mass assignment and then looking in the models to see if they don’t define attr_accessible.) Let’s run it on the buggy Insoshi code and see what we get:

$ rake find_mass_assignment

/path/to/app/controllers/activities_controller.rb
    46      @activity = Activity.new(params[:event])
    68        if @activity.update_attributes(params[:event])

/path/to/app/controllers/comments_controller.rb
    20      @comment = parent.comments.new(params[:comment].

/path/to/app/controllers/messages_controller.rb
    50      @message = Message.new(:parent_id    => original_message.id,
    61      @message = Message.new(params[:message].merge(:sender => current_person,

/path/to/app/controllers/photos_controller.rb
    39      @photo = Photo.new(params[:photo].merge(person_data))
    61        if @photo.update_attributes(:primary => true)

/path/to/app/controllers/posts_controller.rb
    59        if @post.update_attributes(params[:post])
    157          post = @topic.posts.new(params[:post].merge(:person => current_person))
    159          post = @blog.posts.new(params[:post])

/path/to/app/controllers/topics_controller.rb
    28      @topic = @forum.topics.new(params[:topic].merge(:person => current_person))
    44        if @topic.update_attributes(params[:topic])

Yikes! That’s a lot of problems. How do we squash all these bugs?

One of the vulnerable models is the Post model, which is the base class for the ForumPost and BlogPost models. We’ll use the ForumPost model as our example. First we disable attr_accessible in the Post model, since we want to force all the derived classes to redefine it:

app/models/post.rb


class Post < ActiveRecord::Base
  include ActivityLogger
  has_many :activities, :foreign_key => "item_id", :dependent => :destroy
  attr_accessible nil
end

Then we set attr_accessible in the ForumPost model to allow only the post body to be set by mass assignment:

app/models/forum_post.rb


class ForumPost < Post
  .
  .
  .

  attr_accessible :body

  belongs_to :topic,  :counter_cache => true
  belongs_to :person, :counter_cache => true

  validates_presence_of :body, :person
  validates_length_of :body, :maximum => 5000
  .
  .
  .
end

Then in the Posts controller we update


  post = @topic.posts.build(params[:post].merge(:person => current_person))

to set the person attribute explicitly:


  post = @topic.posts.build(params[:post])
  post.person = current_person

Bypassing attr_accessible

This fixes the controller action, but unfortunately the corresponding RSpec specs fail. Having a good test suite proved invaluable in fixing the mass assignment problems, but the tests use mass assignment themselves, and much of that code fails. For example, here is part of the Post spec:

spec/models/post_spec.rb


describe ForumPost do

  before(:each) do
    @post = topics(:one).build(:body => "Hey there",
                               :person => people(:quentin))
  end
  .
  .
  .
end

This fails because of the attempt to set the person attribute by mass assignment. We could fix this as in the controller:


describe ForumPost do

  before(:each) do
    @post = topics(:one).build(:body => "Hey there")
    @post.topics.person = people(:quentin)
  end
  .
  .
  .
end

Unfortunately, the tests are riddled with this sort of code, and it’s a nightmare to make all such changes by hand. Moreover, inside the tests we simply don’t care about mass assignment vulnerabilities, so making a bunch of cumbersome changes is particularly annoying. Luckily, there’s a nice solution; after searching for a bit, I found an inspiring Pastie, which led me to open up ActiveRecord::Base and add some unsafe methods to create Active Record objects that bypass attr_accessible:

config/initializers/unsafe_build_and_create.rb


class ActiveRecord::Base

  # Build and create records unsafely, bypassing attr_accessible.
  # These methods are especially useful in tests and in the console.

  def self.unsafe_build(attrs)
    record = new
    record.unsafe_attributes = attrs
    record
  end

  def self.unsafe_create(attrs)
    record = unsafe_build(attrs)
    record.save
    record
  end

  def self.unsafe_create!(attrs)
    unsafe_build(attrs).save!
  end

  def unsafe_attributes=(attrs)
    attrs.each do |k, v|
      send("#{k}=", v)
    end
  end
end

(By putting in the config/initializers/ directory, we ensure that the additions will be loaded automatically as part of the Rails environment.)

With these methods in hand, we still have to update the tests by hand, but the edits are much simpler (and many can be done by search-and-replace):


describe ForumPost do

  before(:each) do
    @post = topics(:one).unsafe_build(:body => "Hey there",
                                      :person => people(:quentin))
  end
  .
  .
  .
end

We can use these methods in the controllers, too, of course, but if we do the word “unsafe” serves as a constant reminder that we’d better be really sure we want to bypass attr_accessible.

After making all the fixes, running our Rake task shows only one potentially vulnerable model:

$ rake find_mass_assignment
/Users/mhartl/rails/insoshi_core/app/controllers/photos_controller.rb
    40      @photo = Photo.new(params[:photo].merge(person_data))
    62        if @photo.update_attributes(:primary => true)

Checking the Photo model, we see that it defines attr_protected instead of attr_accessible (and explains why):

app/models/photo.rb


class Photo < ActiveRecord::Base
  include ActivityLogger
  UPLOAD_LIMIT = 5 # megabytes

  # attr_accessible is a nightmare with attachment_fu, so use
  # attr_protected instead.
  attr_protected :id, :person_id, :parent_id, :created_at, :updated_at
  .
  .
  .
end

With that, we’re done, and our application is secure. Huzzah!

Mass assignment in Rails applications

Filed under: Ruby on Rails, mass assignment — mhartl @ 18:13

This is a brief review of mass assignment in Rails. See the follow-up post on Finding and fixing mass assignment problems in Rails applications for some more tips on how to find and fix mass assignment problems.

We’ll begin with a simple example. Suppose an application has a User model that looks like this:


# == Schema Information
# Table name: users
#
#  id                         :integer(11)     not null, primary key
#  email                      :string(255)
#  name                       :string(255)
#  password                   :string(255)
#  admin                      :boolean(1)      not null
class User < ActiveRecord::Base
  validates_presence_of :email, :password
  validates_uniqueness_of :email
  .
  .
  .
end

Note the presence of an admin boolean to identify administrative users. With this model, the Users controller might have this standard update code:


  def update
    @user = User.find(params[:id])

    respond_to do |format|
      if @user.update_attributes(params[:user])
        flash[:notice] = 'User was successfully updated.'
        format.html { redirect_to(@user) }
      else
        format.html { render :action => "edit" }
      end
    end
  end

This works fine, but note that the line


  if @user.update_attributes(params[:user])

performs an update to the @user object through the params hash, assigning all the @user attributes at once—that is, as a mass assignment.

The problem with mass assignment is that some malicious [cr|h]acker might write a script to PUT something like name=New+Name&admin=1, thereby adding himself as an administrative user! This would be a Bad Thing™. The standard solution to this problem is to use attr_accessible in the model to declare explicitly the attributes that can be modified by mass assignment. To protect our User model, for example, we would write


class User < ActiveRecord::Base

  attr_accessible :email, :name, :password

  validates_presence_of :email, :password
  validates_uniqueness_of :email
  .
  .
  .
end

Since :admin isn’t included in the attr_accessible argument list, the User model’s admin attribute is safe from unwanted modification.

This seems simple enough, but the rub is that remembering to protect against mass assignment is difficult. Using mass assignment doesn’t affect the normal operations of the site, so it’s hard to notice the problem. Moreover, although you could shut off mass assignment globally, often there are many models that are used internally and never get modified directly by a web interface. Not being able to use mass assignment for these models is inconvenient, and manually making all attributes attr_accessible is cumbersome and error-prone. So, what’s a Rails developer to do?

Spurred by an email from Eric Chapweske of Slantwise Design, I recently audited the Insoshi social network for mass assignment vulnerabilities. Doing this manually was annoying, so in the process I developed a simple plugin to find likely vulnerabilities automatically, by searching through the controllers for likely mass assignment and then looking in the models to see if they didn’t define attr_accessible. The result is a list of potential trouble spots.

To use the find_mass_assignment plugin, simply install it from GitHub as follows:

$ script/plugin install git://github.com/mhartl/find_mass_assignment.git

(You’ll need Git and Rails 2.1 or later for this to work.) The plugin defines a Rake task to find mass assignment vulnerabilities; running it on the example Users controller from above would yield the following:

$ rake find_mass_assignment

/path/to/app/controllers/users_controller.rb
  5  if @user.update_attributes(params[:user])

This tells us that line 5 in the Users controller has a likely mass assignment vulnerability.

The find_mass_assignment plugin doesn’t fix mass assignment problems automatically, but by making it more convenient to find them I hope it can significantly improve the odds that they will be caught (and fixed!) quickly.

2008-08-15

A security issue with Rails secret session keys

Filed under: Git, Insoshi, Ruby on Rails — mhartl @ 21:53

Like most projects that use Rails 2.1, the Insoshi source code ships with a “secret” string (which lives in environment.rb) needed for the new cookie-based sessions. Recently, an alert observer noted that this raises a security issue in Insoshi sessions: the secret key is currently the same for all Insoshi installations, which opens the sessions up to attack (as noted in this discussion thread). This problem is not unique to Insoshi; it affects essentially any Rails application installed from source.

Part of the reason this problem isn’t more widely known is because projects generated using the rails script automatically receive a unique security string. The way we’ve fixed the secret string problem at Insoshi involves piggybacking on the mechanism Rails already has for generating such strings, by replacing the hard-coded string with a file read:

config/environment.rb

Before:


config.action_controller.session = {
    :session_key => '_instant_social_session',
    :secret      => '63143b62...8522327'
  }

After:


.
.
.
require File.join(File.dirname(__FILE__), 'boot')
require 'rails_generator/secret_key_generator'

Rails::Initializer.run do |config|
  .
  .
  .
  # Your secret key for verifying cookie session data integrity.
  # If you change this key, all old sessions will become invalid!
  # Make sure the secret is at least 30 characters and all random,
  # no regular words or you'll be exposed to dictionary attacks.
  secret_file = File.join(RAILS_ROOT, "secret")
  if File.exist?(secret_file)
    secret = File.read(secret_file)
  else
    secret = Rails::SecretKeyGenerator.new("insoshi").generate_secret
    File.open(secret_file, 'w') { |f| f.write(secret) }
  end
  config.action_controller.session = {
    :session_key => '_instant_social_session',
    :secret      => secret
  }
  .
  .
  .

(N.B. The session key _instant_social_session is a hint about the origins of the name Insoshi.) In place of a hard-coded string, the updated code uses the contents of a secret file, if it exists; otherwise, it makes a new string using the same machinery as the rails script (included with the line require 'rails_generator/secret_key_generator') and writes it to the secret file.

It’s important at this point to prevent our source code management tool from versioning the secret file, since the whole point of this exercise is to prevent the secret key from being distributed with the source code. Using Git, this is trivial; we just add ’secret’ to our .gitignore file. (Note: if you are running an application on multiple servers, you should copy the same secret file to each one to ensure that sessions will work with a load-balancer.) Everyone using the Insoshi source code should pull from our GitHub repository to get the update.

Handling session expiration

Unfortunately, the above steps don’t completely solve our problem. The comments in environment.rb note that “If you change this key, all old sessions will become invalid!” That’s not quite accurate; the old sessions don’t merely become invalid: they actually raise an exception, so users with active sessions will be met with your application’s error page, and a CGI::Session::CookieStore::TamperedWithCookie exception will show up in your application’s log file. (The error page goes away if the user reloads the page in their browser, but there’s no way for them to know that.) Serving up error pages to all those users isn’t very friendly behavior, and we’d like to catch the exception and show the page they’re trying to access instead.

This isn’t as simple as it seems, because the exception gets raised deep inside the Rails internals. We can figure out where by running in development mode, where the stack trace look something like this:

CGI::Session::CookieStore::TamperedWithCookie in HomeController#index 

vendor/rails/actionpack/lib/action_controller/session/cookie_store.rb:144:in `unmarshal'
vendor/rails/actionpack/lib/action_controller/session/cookie_store.rb:101:in `restore'
/usr/local/lib/ruby/1.8/cgi/session.rb:304:in `[]'
vendor/rails/actionpack/lib/action_controller/cgi_process.rb:136:in `session'
vendor/rails/actionpack/lib/action_controller/cgi_process.rb:168:in `stale_session_check!'
vendor/rails/actionpack/lib/action_controller/cgi_process.rb:116:in `session'
.
.
.

To catch the exception, we need to override the default restore method in cookie_store.rb. To do that, we need to load our change before the application loads, and the easiest way to do this is with a plugin, which we can generate with a script:

$ script/generate plugin catch_cookie_exception

Once we edit a couple files, the solution is complete:

vendor/plugins/catch_cookie_exception/init.rb


require 'catch_cookie_exception'

vendor/plugins/catch_cookie_exception/lib/catch_cookie_exception.rb


require 'cgi'
require 'cgi/session'
class CGI::Session::CookieStore
  # Restore session data from the cookie.
  # This method overrides the one in
  # actionpack/lib/action_controller/session/cookie_store.rb
  # in order to handle the case of a "tampered" cookie more gracefully.
  # The issue is that changing the 'secret' in config/environment.rb
  # breaks all sessions in such a way that everyone gets an error page
  # the first time they revisit the site.  Catching the exception here
  # prevents this ugly behavior.
  # This is in a plugin so that it loads after Rails but before environment.rb.
  def restore
    @original = read_cookie
    @data = unmarshal(@original) || {}
  rescue CGI::Session::CookieStore::TamperedWithCookie
    logger = Logger.new("#{RAILS_ROOT}/log/#{RAILS_ENV}.log")
    logger.warn "Caught TamperedWithCookie exception on #{Time.now}"
    @data = {}
  end
end

Note that, since the exception could be the result of someone attacking the site by tampering with their cookies, we log the exception for future reference.

UPDATE: The catch_cookie_exception plugin is now available at GitHub.

Acknowledgments

Thanks again to Trevor Turk for alerting us to this issue.

2008-07-28

Running Rails tests with autotest (ZenTest) and RSpec

Filed under: RSpec, Ruby on Rails, autotest — mhartl @ 13:12

I recently ran into a problem with autotest (ZenTest) after upgrading to Rails 2.1 and RSpec 1.4.1. Solving it was annoying, so I hope I can save others some trouble. Here’s the problem:

With RSpec, autotest hangs

Before the upgrade, I could run my specs just fine using the plain autotest command, but after the upgrade autotest just hangs:

$ autotest
loading autotest/rails

This is on a system running Mac OS X Tiger (10.4), Rails 2.1.0, RSpec 1.4.1, and ZenTest 3.10.0. Strangely, my friend Long could run autotest fine on a virtually identical system (so you may not run into this problem), but for me this only increased the frustration. After much hand-wringing (and a lot of Google searching), I finally found a Rails Forum post with a solution:

$ RSPEC=true autotest

Then autotest runs normally.

Restoring the old RSpec/autotest behavior

To get the old behavior, you can include the RSPEC variable in your environment rather than putting it explicitly on the command line. For example, on a system running bash, export the RSPEC variable as follows:

file: ~/.bashrc

export RSPEC=true

Then source it:

$ . ~/.bashrc

Now autotest should run as before:

$ autotest

Voilà (I hope)!

UPDATE: Since making this post, I’ve learned that RSpec now ships with a program called autospec that solves the same problem; just run

$ autospec

and the specs should run as expected.

2008-07-17

Searching a Ruby on Rails application with Sphinx and Ultrasphinx

Filed under: Ferret, Insoshi, Ruby on Rails, Sphinx, Ultrasphinx — mhartl @ 16:46

We recently switched the Insoshi social networking platform from a Ferret search engine to Sphinx (and Ultrasphinx), due to the well-known problems encountered with Ferret and due to our own experience of its instability on the Insoshi developer site. (Sphinx is currently running on our demo site, and anyone who wants the Sphinx-enabled source can grab edge Insoshi as described in the Rails 2.1 upgrade post. We’ll merge it into the master branch within a couple weeks.)

The switch did not always go smoothly, and there are several gotchas that I thought might be helpful to discuss in case other people run into them. I’ve also included some material on using Ultrasphinx, since its documentation is a bit sparse. For pedagogical purposes, I’ve simplified the Insoshi source slightly for this discussion; you don’t have to be familiar with the Insoshi codebase to follow this post. (N.B. The actual production code contains a trick for dealing with more advanced filtering requirements, which will probably be the subject of a future post.)

Installing Sphinx

The first step, naturally enough, is to install Sphinx. You can get the latest and greatest version at the Sphinx download page. (This blog post uses version 0.9.8, which was released just a couple of days before this post was written.) Download the source, and then install it as follows:

$ tar zxf sphinx-0.9.8.tar.gz
$ cd  sphinx-0.9.8
$ ./configure --with-pgsql
$ make
$ sudo make install

The configure step ensures that Sphinx gets compiled with PostgreSQL support (MySQL comes for free). We’ve had trouble getting all the Postgres stuff to work properly, but it doesn’t hurt to have it. If you’d rather omit the Postgres support, just use ./configure by itself.

Installing Ultrasphinx

The second step is to install the Ultrasphinx plugin, which has one gem dependency:

$ sudo gem install chronic

The installation itself is trickier than it sounds; although there are plenty of tutorials that tell you how to do it, as far as I can tell they don’t work. I tried a couple of different tacks, both of which failed. First, I tried

$ svn export svn://rubyforge.org/var/svn/fauna/ultrasphinx/trunk vendor/plugins/ultrasphinx
Export complete.

The only problem is, this didn’t do anything; there was literally no change to my working copy. I then tried a plugin install:

$ script/plugin install svn://rubyforge.org/var/svn/fauna/ultrasphinx/trunk
Export complete.

Still nothing. After some time flailing about, I finally found a James on Software Sphinx/Ultrasphinx post, which suggested cloning his GitHub fork of Ultrasphinx. That worked at first, but later on I encountered a clash with the latest version of will_paginate:

WillPaginate: You are using a paginated collection of class
Ultrasphinx::Search which conforms to the old API of WillPaginate::Collection
by using `page_count`, while the current method name is `total_pages`. Please
upgrade yours or 3rd-party code that provides the paginated collection.

Luckily, with some judicious Googling I was able to find a second repository at GitHub, whose most recent commit as of this writing is updating the code to work with the latest will_paginate, which certainly looked promising. And, indeed, it worked beautifully, so I’m happy to recommend it:

$ git clone git://github.com/DrMark/ultrasphinx.git vendor/plugins/ultrasphinx
$ rm -rf vendor/plugins/ultrasphinx/.git

(This is one of the many reasons GitHub rocks; if the “official” version of a plugin is unavailable or out of date, you still might be able to find an updated fork on GitHub.)

Configuring Ultrasphinx

To configure Ultrasphinx, I followed the config instructions at the main Ultrasphinx site:

Next, copy the examples/default.base file to RAILS_ROOT/config/ultrasphinx/default.base.
This file sets up the Sphinx daemon options such as port, host, and index location.

Since many of the Insoshi fields allow HTML, the search results are better if we strip HTML tags first:

config/ultrasphinx/default.base

index
{
  .
  .
  .
  # HTML-specific options
  html_strip = 1
}

N.B. This is a replacement for the older strip_html syntax, used inside the source section:

config/ultrasphinx/default.base

source
{
  # Individual SQL source options
  sql_ranged_throttle = 0
  sql_range_step = 5000
  sql_query_post =
  strip_html = 1
}

If you get a warning like

WARNING: key 'strip_html' is deprecated in config/ultrasphinx/development.conf line 24;
use 'html_strip (per-index)' instead.

just remove the strip_html line and put an html_strip line in its place (taking care to put it in the index section of the configuration file).

Bootstrapping Ultrasphinx

Now we’re ready to fire up Ultrasphinx, which uses Sphinx to build up a search index of our database:

$ rake ultrasphinx:bootstrap

There’s just one hitch: many people (including me) get an error at this stage:

dyld: Library not loaded: /usr/local/mysql/lib/mysql/libmysqlclient.15.dylib
  Referenced from: /usr/local/bin/indexer
  Reason: image not found

I found a solution using the canonical “Google the error message” method. There’s something screwy with the location of the MySQL libraries, but it’s nothing a little symlink couldn’t fix:

$ sudo ln -s /usr/local/mysql/lib /usr/local/mysql/lib/mysql

Testing Sphinx and Ultrasphinx

In principle, things are working now under the hood; we just need to add in some code to our models and controllers to execute the searches. I prefer test-driven development, though, so the next priority is to get Sphinx and Ultrasphinx working in a test environment.

It’s important to stop the Ultrasphinx daemon, which might be running in development mode if you used rake ultrasphinx:bootstrap above:

$ rake ultrasphinx:daemon:stop

Then make a test-specific configuration file:

config/ultrasphinx/test.base

{
  # Individual SQL source options
  sql_ranged_throttle = 0
  sql_range_step = 999999999
  sql_query_post =
}
.
.
.
index
{
  .
  .
  .
  # HTML-specific options
  html_strip = 1
}

The line sql_range_step = 999999999 here is key. The sql_range_step variable controls how much Ultrasphinx increases the ids of the rows as it indexes; by default, it’s 5000, but Insoshi uses foxy fixtures, which often create objects with huge ids. As a result, the indexing step can take a long time (several minutes), even for a tiny test database. Setting sql_range_step to a larger step size solves the problem.

With that done, we’re ready to fire things up:

$ rake ultrasphinx:bootstrap RAILS_ENV=test

One problem we run into is that the Sphinx test daemon might not always be running, so it would be nice to skip the search tests (or specs) if this is the case. For example, suppose that we have a Searches controller (whose index action will handle searches). Here is a skeleton for the Searches controller specs that runs only when Sphinx is running:

spec/controllers/searches_controller_spec.rb


# Return a list of system processes.
def processes
  process_cmd = case RUBY_PLATFORM
                when /djgpp|(cyg|ms|bcc)win|mingw/ then 'tasklist /v'
                when /solaris/                     then 'ps -ef'
                else
                  'ps aux'
                end
  `#{process_cmd}`
end

# Return true if the search daemon is running.
def testing_search?
  processes.include?('searchd')
end

describe SearchesController do
  .

  .
  .
end if testing_search?

(A blog post on testing with Ultrasphinx proved useful in this context.)

Writing the first tests

OK, now we’re ready to write some concrete tests. Some basic tests (using RSpec) might look like these:

spec/controllers/searches_controller_spec.rb


describe SearchesController do

  describe "Person searches" do

    it "should search by name" do
      get :index, :q => "quentin", :model => "Person"
      assigns(:results).should == [people(:quentin)].paginate
    end

    it "should search by description" do
      get :index, :q => "I'm Quentin", :model => "Person"
      assigns(:results).should == [people(:quentin)].paginate
    end
  end
end if testing_search?

Here we’ve passed a model parameter in anticipation of using a single action to search multiple models.

The specs fail, of course:

$ script/spec spec/controllers/searches_controller_spec.rb
2 examples, 2 failures

Apart from the if testing_search? clause, there’s nothing here beyond vanilla RSpec, so in what follows I won’t bother showing any more specs.

Person: Basic indexing

Now we’re ready for some basic searching. Suppose we have a Person model with name and description fields, which we want to enable for searching. We need the is_indexed method from Ultrasphinx:

app/models/person.rb


class Person < ActiveRecord::Base
  is_indexed :fields => [ 'name', 'description' ]
  .
  .
  .
end

Then a sample Searches controller index might look like this:

app/controllers/searches_controller.rb


def index
  query = params[:q].strip
  page  = params[:page] || 1
  model = params[:model]
  filters = {}
  @search = Ultrasphinx::Search.new(:query => query,
                                    :page => page,
                                    :class_names => model,
                                    :filters => filters)
  @search.run
  @results = @search.results
end

Note the use of a :page option; Ultrasphinx works with the will_paginate plugin out of the box.

A sample search box partial might look like this:

app/views/searches/_box.html.erb


<% form_tag searches_path, :method => :get do %>
  <fieldset>
    <%= text_field_tag :q, h(params[:q]), :maxlength => 50 %>
    <%= submit_tag "Search" %>
    <%= hidden_field_tag "model", search_model %>
  </fieldset>
<% end %>

where search_model is just a helper that inspects params and returns the name of the model being searched. (For example:

app/helpers/searches_helper.rb


module SearchesHelper

  # Return the model to be searched based on params.
  def search_model
    return "Person"    if params[:controller] =~ /home/
    return "ForumPost" if params[:controller] =~ /forums/
    params[:model] || params[:controller].classify
  end
end

where params[:controller].classify automagically returns the string "Person" inside the People controller and "Message" inside the Messages controller.)

As long as the test database contains the appropriate user (in our case, Quentin from restful_authentication), the specs should pass once we reindex:

$ rake ultrasphinx:bootstrap RAILS_ENV=test
$ script/spec spec/controllers/searches_controller_spec.rb
2 examples, 0 failures

If they fail, chances are that either (1) there’s some rogue development daemon running or (2) we forgot to reindex the test database after changing a model. If this happens, you can be extra paranoid by recycling everything:

$ rake ultrasphinx:daemon:stop
$ rake ultrasphinx:bootstrap RAILS_ENV=test

Message: Ultrasphinx with conditions and filtering

One common task is to put a condition on a search result. For example, suppose we have a Message model with a subject and content we want to index, but with “trashed” messages we want to exclude. Suppose further that recipients trash messages by setting a recipient_deleted_at attribute in the Message model. Untrashed messages would then have a NULL value for recipient_deleted_at:

app/models/message.rb


class Message < ActiveRecord::Base
  is_indexed :fields => [ 'subject', 'content', 'recipient_id' ],
             :conditions => "recipient_deleted_at IS NULL"
  .
  .
  .
end

Of course, when searching through messages for a particular person, we should only return messages actually sent to that person. This is why we added the recipient_id to the index fields above; this way, we can use an Ultrasphinx filter to restrict the results appropriately in the Searches controller:

app/controllers/searches_controller.rb


def index
  query = params[:q].strip
  page  = params[:page] || 1
  model = params[:model]
  filters = {}
  if model == "Message"
    # Restrict message results to those sent to the current person.
    filters['recipient_id'] = current_person.id
  end
  @search = Ultrasphinx::Search.new(:query => params[:q],
                                    :page => params[:page] || 1,
                                    :class_names => params[:model],
                                    :filters => filters)
  @search.run
  @results = @search.results
end

Of course, this requires an appropriately defined current_person object in line 8, which we assume is taken care of by the application’s authentication scheme.

ForumPost: Ultrasphinx with Single Table Inheritance (STI) and associations

Our final example combines conditions with an include. Insoshi has a ForumPost model that inherits from a Post base class (which is also used for blog posts) using Single Table Inheritance (STI). We want to restrict forum searches to the body of forum posts, excluding blog posts. We also want to include the topic name in searches, so that a post “Lorem ipsum” under topic “Foobar” will show up for both the queries “Lorem” and “Foobar”. We can achieve this by using a conditions clause on the STI type, while using an include for the topic association:


class ForumPost < Post
  is_indexed :fields => [ 'body' ],
             :conditions => "type = 'ForumPost'",
             :include => [{:association_name => 'topic', :field => 'name'}]
  belongs_to :topic
  .
  .
  .
end

(If we leave out the type condition, Ultrasphinx happily indexex all the blog posts as well. Rails then complains when trying to make a new ForumPost using a BlogPost id.)

With that, we’ve covered all our basic search needs. As noted above, there’s one more advanced technique being used at Insoshi (handling searches on boolean attributes such as deactivated), which I’ll probably cover in a later post. It’s also worth noting that, unlike Ferret, Sphinx doesn’t update the search index with every Active Record update; you need to update the index periodically with a cron job. Take a look at the Ultrasphinx deployment notes for more details.

TextMate Footnotes and Ultrasphinx

Finally, there’s a minor incompatibility between Ultrasphinx and the latest (Rails 2.1-compatible) TextMate Footnotes, which gives the following error (at least when using vendored Rails):

activesupport/lib/active_support/dependencies.rb:275:in `load_missing_constant':
uninitialized constant Footnotes::Filter (NameError)

This is because Ultrasphinx is looking for the Rails file initializer.rb, but instead it finds initializer.rb as defined by Footnotes. The fix is to change “initializer” to something else (say, “loader”) everywhere; see my fork of Footnotes at GitHub for an example.

Older Posts »

Blog at WordPress.com.