blog.mhartl | Michael Hartl's tech blog

2010-07-28

Deploying to Heroku with Rails 3.0.0.rc

Filed under: Ruby on Rails — mhartl @ 16:34

UPDATE: Heroku now works with the latest Bundler, so this post is (or should be) obsolete.

The Ruby on Rails Tutorial book uses the latest version of Rails, which is the current release candidate of 3.0.0.rc. Unfortunately, at the time of this writing, you can’t deploy applications to Heroku using the Rails release candidate because of a conflict with the latest version of Bundler. Instead of a successful deploy, you get an error like this:

$ git push heroku
Counting objects: 4, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 1018 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)

-----> Heroku receiving push
-----> Rails app detected
-----> Detected Rails is not set to serve static_assets
       Installing rails3_serve_static_assets... done
-----> Gemfile detected, running Bundler
       Unresolved dependencies detected; Installing...
       Your Gemfile.lock was generated by Bundler 0.10.
       You must delete it if you wish to use Bundler 0.9.
       FAILED: Have you updated to use a 0.9 Gemfile?
       http://docs.heroku.com/gems#gem-bundler

error: hooks/pre-receive exited with error code 1

Eventually, Heroku support for the Rails release candidate (or perhaps for the final release) will no doubt be ready, but for now you can work around this problem as follows:

$ [sudo] gem uninstall bundler
$ [sudo] gem install bundler -v 0.9.26
$ rm -f Gemfile.lock

Then use this as your Gemfile, which reverts back to Rails 3.0.0.beta4:

source 'http://rubygems.org'

gem 'rails', '3.0.0.beta4'
gem 'sqlite3-ruby', '1.2.5', :require => 'sqlite3'

Then install the gems:

$ bundle install

At this point, the push to Heroku should work. (Of course, that doesn’t mean it will. :-)

$ git add .
$ git commit -m "Ready to deploy"
$ git push heroku master

2009-05-15

Running rcov with RSpec

Filed under: RSpec, Ruby, Ruby on Rails — mhartl @ 16:01

I recently wanted to run rcov, the Ruby code coverage tool, on a project tested with RSpec. I think I’d done it once before, but I’d forgotten how. After searching to no avail (both the rcov home page and the RSpec page on rcov proved unhelpful), I applied the tried-and-true Wild-Assed Guess™ method and typed

$ rake spec:rcov

That worked.

The reports themselves are in the coverage/ directory:

$ open coverage/index.html

(or navigate your browser to file:///path/to/project/coverage/index.html). If you’re using Git, add coverage/* to your .gitignore file.

N.B. The inverse Rake task also exists:

$ rake -T rcov
rake spec:clobber_rcov  # Remove rcov products for rcov
rake spec:rcov          # Run all specs in spec directory with RCov (excluding plugin specs)

2008-09-26

Using Rails to serve different content to humans and robots

Filed under: Insoshi, Ruby on Rails — mhartl @ 11:38

This post answers the question, How do you use Rails to do one thing for robots, and another thing for humans?

Why would you want to do this? In our case, the Insoshi home page forwards to a portal page that uses frames in order to have an interface that unifies the sites on the insoshi.com domain with those off-site, such as our GitHub repository and bug tracker. The frames page is horribly search-unfriendly, though, so we serve bots the actual content of http://insoshi.com/home/index (our routes map / to /home/index). (Note: The front of the portal page as seen by a human is the same as the index page served to bots; be careful about doing anything else, since bots can punish you if you use this technique for anything slimy.)

Our method is to use a before filter in the Home controller. Here’s the code (minus some irrelevant bits):

class HomeController < ApplicationController
  before_filter :forward_nonbots_to_portal, :only => "index"
  
  .
  .
  .

  private
  
    # Return true if the user agent is a bot.
    def robot?
      bot = /(Baidu|bot|Google|SiteUptime|Slurp|WordPress|ZIBB|ZyBorg)/i
      request.user_agent =~ bot
    end
    
    # Allow an explicit override of the forward_nonbots_to_portal.
    def no_redirect?
      params[:redirect] == 'false' or RAILS_ENV['ENV'] != 'production'
    end
    
    def forward_nonbots_to_portal
      redirect_to "http://portal.insoshi.com" unless robot? or no_redirect?
    end
end

The key here is the robot? method, which has a regex with a list of the most common bots user agents:

    # Return true if the user agent is a bot.
    def robot?
      bot = /(Baidu|bot|Google|SiteUptime|Slurp|WordPress|ZIBB|ZyBorg)/i
      request.user_agent =~ bot
    end

If the user isn’t a bot, we redirect them to the portal.

N.B. The second boolean, no_redirect?, prevents the forwarding in development mode and also allows us to override the redirect by passing a redirect=false parameter. This latter condition allows us to link directly to the home page, without a redirect, by using http://insoshi.com/?redirect=false. In particular, the portal menu home link itself uses this URL, because otherwise clicking on the home link repeatedly would cause a bunch of nested portal pages to appear.

2008-09-21

Finding and fixing mass assignment problems in Rails applications

Filed under: Insoshi, mass assignment, Ruby on Rails — mhartl @ 18:14

Last week I received an email from Eric Chapweske (of Slantwise Design and the Rail Spikes blog) alerting me to mass assignment vulnerabilities in the Insoshi social network sourcecode. (See my post on mass assignment for a quick review of the concept, and don’t miss Eric’s mass assignment article for a more thorough treatment.) I quickly set to work fixing the problems, and within a few hours of receiving the email I’d pushed out a patched version to the Insoshi GitHub repository. Since the process was so instructive, and since mass assignment vulnerabilities are so common, I thought I’d share some of the details of what it took to fix them.

Fixing the models and controllers

The first step in solving mass assignment problems is to find them, so I whipped up a little find_mass_assignment plugin to make it easier:

$ script/plugin install git://github.com/mhartl/find_mass_assignment.git

(You’ll need Git and Rails 2.1 or later for this to work.)

This defines a Rake task to find mass assignment vulnerabilities. (It works by searching through the controllers for likely mass assignment and then looking in the models to see if they don’t define attr_accessible.) Let’s run it on the buggy Insoshi code and see what we get:

$ rake find_mass_assignment

/path/to/app/controllers/activities_controller.rb
    46      @activity = Activity.new(params[:event])
    68        if @activity.update_attributes(params[:event])

/path/to/app/controllers/comments_controller.rb
    20      @comment = parent.comments.new(params[:comment].

/path/to/app/controllers/messages_controller.rb
    50      @message = Message.new(:parent_id    => original_message.id,
    61      @message = Message.new(params[:message].merge(:sender => current_person,

/path/to/app/controllers/photos_controller.rb
    39      @photo = Photo.new(params[:photo].merge(person_data))
    61        if @photo.update_attributes(:primary => true)

/path/to/app/controllers/posts_controller.rb
    59        if @post.update_attributes(params[:post])
    157          post = @topic.posts.new(params[:post].merge(:person => current_person))
    159          post = @blog.posts.new(params[:post])

/path/to/app/controllers/topics_controller.rb
    28      @topic = @forum.topics.new(params[:topic].merge(:person => current_person))
    44        if @topic.update_attributes(params[:topic])

Yikes! That’s a lot of problems. How do we squash all these bugs?

One of the vulnerable models is the Post model, which is the base class for the ForumPost and BlogPost models. We’ll use the ForumPost model as our example. First we disable attr_accessible in the Post model, since we want to force all the derived classes to redefine it:

app/models/post.rb

class Post < ActiveRecord::Base
  include ActivityLogger
  has_many :activities, :foreign_key => "item_id", :dependent => :destroy
  attr_accessible nil
end

Then we set attr_accessible in the ForumPost model to allow only the post body to be set by mass assignment:

app/models/forum_post.rb

class ForumPost < Post
  .
  .
  .

  attr_accessible :body
  
  belongs_to :topic,  :counter_cache => true
  belongs_to :person, :counter_cache => true
  
  validates_presence_of :body, :person
  validates_length_of :body, :maximum => 5000
  .
  .
  .
end

Then in the Posts controller we update

  post = @topic.posts.build(params[:post].merge(:person => current_person))

to set the person attribute explicitly:

  post = @topic.posts.build(params[:post])
  post.person = current_person

Bypassing attr_accessible

This fixes the controller action, but unfortunately the corresponding RSpec specs fail. Having a good test suite proved invaluable in fixing the mass assignment problems, but the tests use mass assignment themselves, and much of that code fails. For example, here is part of the Post spec:

spec/models/post_spec.rb

describe ForumPost do
  
  before(:each) do
    @post = topics(:one).build(:body => "Hey there",
                               :person => people(:quentin))
  end
  .
  .
  .
end

This fails because of the attempt to set the person attribute by mass assignment. We could fix this as in the controller:

describe ForumPost do
  
  before(:each) do
    @post = topics(:one).build(:body => "Hey there")
    @post.topics.person = people(:quentin)
  end
  .
  .
  .
end

Unfortunately, the tests are riddled with this sort of code, and it’s a nightmare to make all such changes by hand. Moreover, inside the tests we simply don’t care about mass assignment vulnerabilities, so making a bunch of cumbersome changes is particularly annoying. Luckily, there’s a nice solution; after searching for a bit, I found an inspiring Pastie, which led me to open up ActiveRecord::Base and add some unsafe methods to create Active Record objects that bypass attr_accessible:

config/initializers/unsafe_build_and_create.rb

class ActiveRecord::Base

  # Build and create records unsafely, bypassing attr_accessible.
  # These methods are especially useful in tests and in the console.
  
  def self.unsafe_build(attrs)
    record = new
    record.unsafe_attributes = attrs
    record
  end
  
  def self.unsafe_create(attrs)
    record = unsafe_build(attrs)
    record.save
    record
  end
  
  def self.unsafe_create!(attrs)
    unsafe_build(attrs).save!
  end

  def unsafe_attributes=(attrs)
    attrs.each do |k, v|
      send("#{k}=", v)
    end
  end
end

(By putting in the config/initializers/ directory, we ensure that the additions will be loaded automatically as part of the Rails environment.)

With these methods in hand, we still have to update the tests by hand, but the edits are much simpler (and many can be done by search-and-replace):

describe ForumPost do
  
  before(:each) do
    @post = topics(:one).unsafe_build(:body => "Hey there",
                                      :person => people(:quentin))
  end
  .
  .
  .
end

We can use these methods in the controllers, too, of course, but if we do the word “unsafe” serves as a constant reminder that we’d better be really sure we want to bypass attr_accessible.

After making all the fixes, running our Rake task shows only one potentially vulnerable model:

$ rake find_mass_assignment
/Users/mhartl/rails/insoshi_core/app/controllers/photos_controller.rb
    40      @photo = Photo.new(params[:photo].merge(person_data))
    62        if @photo.update_attributes(:primary => true)

Checking the Photo model, we see that it defines attr_protected instead of attr_accessible (and explains why):

app/models/photo.rb

class Photo < ActiveRecord::Base
  include ActivityLogger
  UPLOAD_LIMIT = 5 # megabytes
  
  # attr_accessible is a nightmare with attachment_fu, so use
  # attr_protected instead.
  attr_protected :id, :person_id, :parent_id, :created_at, :updated_at
  .
  .
  .
end

With that, we’re done, and our application is secure. Huzzah!

Mass assignment in Rails applications

Filed under: mass assignment, Ruby on Rails — mhartl @ 18:13

This is a brief review of mass assignment in Rails. See the follow-up post on Finding and fixing mass assignment problems in Rails applications for some more tips on how to find and fix mass assignment problems.

We’ll begin with a simple example. Suppose an application has a User model that looks like this:

# == Schema Information
# Table name: users
#
#  id                         :integer(11)     not null, primary key
#  email                      :string(255)     
#  name                       :string(255)     
#  password                   :string(255)
#  admin                      :boolean(1)      not null
class User < ActiveRecord::Base
  validates_presence_of :email, :password
  validates_uniqueness_of :email
  .
  .
  .
end

Note the presence of an admin boolean to identify administrative users. With this model, the Users controller might have this standard update code:

  def update
    @user = User.find(params[:id])

    respond_to do |format|
      if @user.update_attributes(params[:user])
        flash[:notice] = 'User was successfully updated.'
        format.html { redirect_to(@user) }
      else
        format.html { render :action => "edit" }
      end
    end
  end

This works fine, but note that the line

  if @user.update_attributes(params[:user])

performs an update to the @user object through the params hash, assigning all the @user attributes at once—that is, as a mass assignment.

The problem with mass assignment is that some malicious [cr|h]acker might write a script to PUT something like name=New+Name&admin=1, thereby adding himself as an administrative user! This would be a Bad Thing™. The standard solution to this problem is to use attr_accessible in the model to declare explicitly the attributes that can be modified by mass assignment. To protect our User model, for example, we would write

class User < ActiveRecord::Base

  attr_accessible :email, :name, :password

  validates_presence_of :email, :password
  validates_uniqueness_of :email
  .
  .
  .
end

Since :admin isn’t included in the attr_accessible argument list, the User model’s admin attribute is safe from unwanted modification.

This seems simple enough, but the rub is that remembering to protect against mass assignment is difficult. Using mass assignment doesn’t affect the normal operations of the site, so it’s hard to notice the problem. Moreover, although you could shut off mass assignment globally, often there are many models that are used internally and never get modified directly by a web interface. Not being able to use mass assignment for these models is inconvenient, and manually making all attributes attr_accessible is cumbersome and error-prone. So, what’s a Rails developer to do?

Spurred by an email from Eric Chapweske of Slantwise Design, I recently audited the Insoshi social network for mass assignment vulnerabilities. Doing this manually was annoying, so in the process I developed a simple plugin to find likely vulnerabilities automatically, by searching through the controllers for likely mass assignment and then looking in the models to see if they didn’t define attr_accessible. The result is a list of potential trouble spots.

To use the find_mass_assignment plugin, simply install it from GitHub as follows:

$ script/plugin install git://github.com/mhartl/find_mass_assignment.git

(You’ll need Git and Rails 2.1 or later for this to work.) The plugin defines a Rake task to find mass assignment vulnerabilities; running it on the example Users controller from above would yield the following:

$ rake find_mass_assignment

/path/to/app/controllers/users_controller.rb
  5  if @user.update_attributes(params[:user])

This tells us that line 5 in the Users controller has a likely mass assignment vulnerability.

The find_mass_assignment plugin doesn’t fix mass assignment problems automatically, but by making it more convenient to find them I hope it can significantly improve the odds that they will be caught (and fixed!) quickly.

2008-08-15

A security issue with Rails secret session keys

Filed under: Git, Insoshi, Ruby on Rails — mhartl @ 21:53

Like most projects that use Rails 2.1, the Insoshi source code ships with a “secret” string (which lives in environment.rb) needed for the new cookie-based sessions. Recently, an alert observer noted that this raises a security issue in Insoshi sessions: the secret key is currently the same for all Insoshi installations, which opens the sessions up to attack (as noted in this discussion thread). This problem is not unique to Insoshi; it affects essentially any Rails application installed from source.

Part of the reason this problem isn’t more widely known is because projects generated using the rails script automatically receive a unique security string. The way we’ve fixed the secret string problem at Insoshi involves piggybacking on the mechanism Rails already has for generating such strings, by replacing the hard-coded string with a file read:

config/environment.rb

Before:


config.action_controller.session = {
    :session_key => '_instant_social_session',
    :secret      => '63143b62...8522327'
  }

After:

.
.
.
require File.join(File.dirname(__FILE__), 'boot')
require 'rails_generator/secret_key_generator'

Rails::Initializer.run do |config|
  .
  .
  .
  # Your secret key for verifying cookie session data integrity.
  # If you change this key, all old sessions will become invalid!
  # Make sure the secret is at least 30 characters and all random,
  # no regular words or you'll be exposed to dictionary attacks.
  secret_file = File.join(RAILS_ROOT, "secret")
  if File.exist?(secret_file)
    secret = File.read(secret_file)
  else
    secret = Rails::SecretKeyGenerator.new("insoshi").generate_secret
    File.open(secret_file, 'w') { |f| f.write(secret) }
  end
  config.action_controller.session = {
    :session_key => '_instant_social_session',
    :secret      => secret
  }
  .
  .
  .

(N.B. The session key _instant_social_session is a hint about the origins of the name Insoshi.) In place of a hard-coded string, the updated code uses the contents of a secret file, if it exists; otherwise, it makes a new string using the same machinery as the rails script (included with the line require 'rails_generator/secret_key_generator') and writes it to the secret file.

It’s important at this point to prevent our source code management tool from versioning the secret file, since the whole point of this exercise is to prevent the secret key from being distributed with the source code. Using Git, this is trivial; we just add ‘secret’ to our .gitignore file. (Note: if you are running an application on multiple servers, you should copy the same secret file to each one to ensure that sessions will work with a load-balancer.) Everyone using the Insoshi source code should pull from our GitHub repository to get the update.

Handling session expiration

Unfortunately, the above steps don’t completely solve our problem. The comments in environment.rb note that “If you change this key, all old sessions will become invalid!” That’s not quite accurate; the old sessions don’t merely become invalid: they actually raise an exception, so users with active sessions will be met with your application’s error page, and a CGI::Session::CookieStore::TamperedWithCookie exception will show up in your application’s log file. (The error page goes away if the user reloads the page in their browser, but there’s no way for them to know that.) Serving up error pages to all those users isn’t very friendly behavior, and we’d like to catch the exception and show the page they’re trying to access instead.

This isn’t as simple as it seems, because the exception gets raised deep inside the Rails internals. We can figure out where by running in development mode, where the stack trace look something like this:

CGI::Session::CookieStore::TamperedWithCookie in HomeController#index 

vendor/rails/actionpack/lib/action_controller/session/cookie_store.rb:144:in `unmarshal'
vendor/rails/actionpack/lib/action_controller/session/cookie_store.rb:101:in `restore'
/usr/local/lib/ruby/1.8/cgi/session.rb:304:in `[]'
vendor/rails/actionpack/lib/action_controller/cgi_process.rb:136:in `session'
vendor/rails/actionpack/lib/action_controller/cgi_process.rb:168:in `stale_session_check!'
vendor/rails/actionpack/lib/action_controller/cgi_process.rb:116:in `session'
.
.
.

To catch the exception, we need to override the default restore method in cookie_store.rb. To do that, we need to load our change before the application loads, and the easiest way to do this is with a plugin, which we can generate with a script:

$ script/generate plugin catch_cookie_exception

Once we edit a couple files, the solution is complete:

vendor/plugins/catch_cookie_exception/init.rb

require 'catch_cookie_exception'

vendor/plugins/catch_cookie_exception/lib/catch_cookie_exception.rb

require 'cgi'
require 'cgi/session'
class CGI::Session::CookieStore
  # Restore session data from the cookie.
  # This method overrides the one in
  # actionpack/lib/action_controller/session/cookie_store.rb
  # in order to handle the case of a "tampered" cookie more gracefully.
  # The issue is that changing the 'secret' in config/environment.rb
  # breaks all sessions in such a way that everyone gets an error page
  # the first time they revisit the site.  Catching the exception here
  # prevents this ugly behavior.
  # This is in a plugin so that it loads after Rails but before environment.rb.
  def restore
    @original = read_cookie
    @data = unmarshal(@original) || {}
  rescue CGI::Session::CookieStore::TamperedWithCookie
    logger = Logger.new("#{RAILS_ROOT}/log/#{RAILS_ENV}.log")
    logger.warn "Caught TamperedWithCookie exception on #{Time.now}"
    @data = {}
  end
end

Note that, since the exception could be the result of someone attacking the site by tampering with their cookies, we log the exception for future reference.

UPDATE: The catch_cookie_exception plugin is now available at GitHub.

Acknowledgments

Thanks again to Trevor Turk for alerting us to this issue.

2008-07-28

Running Rails tests with autotest (ZenTest) and RSpec

Filed under: autotest, RSpec, Ruby on Rails — mhartl @ 13:12

I recently ran into a problem with autotest (ZenTest) after upgrading to Rails 2.1 and RSpec 1.4.1. Solving it was annoying, so I hope I can save others some trouble. Here’s the problem:

With RSpec, autotest hangs

Before the upgrade, I could run my specs just fine using the plain autotest command, but after the upgrade autotest just hangs:

$ autotest
loading autotest/rails

This is on a system running Mac OS X Tiger (10.4), Rails 2.1.0, RSpec 1.4.1, and ZenTest 3.10.0. Strangely, my friend Long could run autotest fine on a virtually identical system (so you may not run into this problem), but for me this only increased the frustration. After much hand-wringing (and a lot of Google searching), I finally found a Rails Forum post with a solution:

$ RSPEC=true autotest

Then autotest runs normally.

Restoring the old RSpec/autotest behavior

To get the old behavior, you can include the RSPEC variable in your environment rather than putting it explicitly on the command line. For example, on a system running bash, export the RSPEC variable as follows:

file: ~/.bashrc

export RSPEC=true

Then source it:

$ . ~/.bashrc

Now autotest should run as before:

$ autotest

Voilà (I hope)!

UPDATE: Since making this post, I’ve learned that RSpec now ships with a program called autospec that solves the same problem; just run

$ autospec

and the specs should run as expected.

2008-07-17

Searching a Ruby on Rails application with Sphinx and Ultrasphinx

Filed under: Ferret, Insoshi, Ruby on Rails, Sphinx, Ultrasphinx — mhartl @ 16:46

We recently switched the Insoshi social networking platform from a Ferret search engine to Sphinx (and Ultrasphinx), due to the well-known problems encountered with Ferret and due to our own experience of its instability on the Insoshi developer site. (Sphinx is currently running on our demo site, and anyone who wants the Sphinx-enabled source can grab edge Insoshi as described in the Rails 2.1 upgrade post. We’ll merge it into the master branch within a couple weeks.)

The switch did not always go smoothly, and there are several gotchas that I thought might be helpful to discuss in case other people run into them. I’ve also included some material on using Ultrasphinx, since its documentation is a bit sparse. For pedagogical purposes, I’ve simplified the Insoshi source slightly for this discussion; you don’t have to be familiar with the Insoshi codebase to follow this post. (N.B. The actual production code contains a trick for dealing with more advanced filtering requirements, which will probably be the subject of a future post.)

Installing Sphinx

The first step, naturally enough, is to install Sphinx. You can get the latest and greatest version at the Sphinx download page. (This blog post uses version 0.9.8, which was released just a couple of days before this post was written.) Download the source, and then install it as follows:

$ tar zxf sphinx-0.9.8.tar.gz
$ cd  sphinx-0.9.8
$ ./configure --with-pgsql
$ make
$ sudo make install

The configure step ensures that Sphinx gets compiled with PostgreSQL support (MySQL comes for free). We’ve had trouble getting all the Postgres stuff to work properly, but it doesn’t hurt to have it. If you’d rather omit the Postgres support, just use ./configure by itself.

Installing Ultrasphinx

The second step is to install the Ultrasphinx plugin, which has one gem dependency:

$ sudo gem install chronic

The installation itself is trickier than it sounds; although there are plenty of tutorials that tell you how to do it, as far as I can tell they don’t work. I tried a couple of different tacks, both of which failed. First, I tried

$ svn export svn://rubyforge.org/var/svn/fauna/ultrasphinx/trunk vendor/plugins/ultrasphinx
Export complete.

The only problem is, this didn’t do anything; there was literally no change to my working copy. I then tried a plugin install:

$ script/plugin install svn://rubyforge.org/var/svn/fauna/ultrasphinx/trunk
Export complete.

Still nothing. After some time flailing about, I finally found a James on Software Sphinx/Ultrasphinx post, which suggested cloning his GitHub fork of Ultrasphinx. That worked at first, but later on I encountered a clash with the latest version of will_paginate:

WillPaginate: You are using a paginated collection of class
Ultrasphinx::Search which conforms to the old API of WillPaginate::Collection
by using `page_count`, while the current method name is `total_pages`. Please
upgrade yours or 3rd-party code that provides the paginated collection.

Luckily, with some judicious Googling I was able to find a second repository at GitHub, whose most recent commit as of this writing is updating the code to work with the latest will_paginate, which certainly looked promising. And, indeed, it worked beautifully, so I’m happy to recommend it:

$ git clone git://github.com/DrMark/ultrasphinx.git vendor/plugins/ultrasphinx
$ rm -rf vendor/plugins/ultrasphinx/.git

(This is one of the many reasons GitHub rocks; if the “official” version of a plugin is unavailable or out of date, you still might be able to find an updated fork on GitHub.)

Configuring Ultrasphinx

To configure Ultrasphinx, I followed the config instructions at the main Ultrasphinx site:

Next, copy the examples/default.base file to RAILS_ROOT/config/ultrasphinx/default.base.
This file sets up the Sphinx daemon options such as port, host, and index location.

Since many of the Insoshi fields allow HTML, the search results are better if we strip HTML tags first:

config/ultrasphinx/default.base

index
{
  .
  .
  .
  # HTML-specific options
  html_strip = 1
}

N.B. This is a replacement for the older strip_html syntax, used inside the source section:

config/ultrasphinx/default.base

source
{
  # Individual SQL source options
  sql_ranged_throttle = 0
  sql_range_step = 5000
  sql_query_post =
  strip_html = 1
}

If you get a warning like

WARNING: key 'strip_html' is deprecated in config/ultrasphinx/development.conf line 24;
use 'html_strip (per-index)' instead.

just remove the strip_html line and put an html_strip line in its place (taking care to put it in the index section of the configuration file).

Bootstrapping Ultrasphinx

Now we’re ready to fire up Ultrasphinx, which uses Sphinx to build up a search index of our database:

$ rake ultrasphinx:bootstrap

There’s just one hitch: many people (including me) get an error at this stage:

dyld: Library not loaded: /usr/local/mysql/lib/mysql/libmysqlclient.15.dylib
  Referenced from: /usr/local/bin/indexer
  Reason: image not found

I found a solution using the canonical “Google the error message” method. There’s something screwy with the location of the MySQL libraries, but it’s nothing a little symlink couldn’t fix:

$ sudo ln -s /usr/local/mysql/lib /usr/local/mysql/lib/mysql

Testing Sphinx and Ultrasphinx

In principle, things are working now under the hood; we just need to add in some code to our models and controllers to execute the searches. I prefer test-driven development, though, so the next priority is to get Sphinx and Ultrasphinx working in a test environment.

It’s important to stop the Ultrasphinx daemon, which might be running in development mode if you used rake ultrasphinx:bootstrap above:

$ rake ultrasphinx:daemon:stop

Then make a test-specific configuration file:

config/ultrasphinx/test.base

{
  # Individual SQL source options
  sql_ranged_throttle = 0
  sql_range_step = 999999999
  sql_query_post =
}
.
.
.
index
{
  .
  .
  .
  # HTML-specific options
  html_strip = 1
}

The line sql_range_step = 999999999 here is key. The sql_range_step variable controls how much Ultrasphinx increases the ids of the rows as it indexes; by default, it’s 5000, but Insoshi uses foxy fixtures, which often create objects with huge ids. As a result, the indexing step can take a long time (several minutes), even for a tiny test database. Setting sql_range_step to a larger step size solves the problem.

With that done, we’re ready to fire things up:

$ rake ultrasphinx:bootstrap RAILS_ENV=test

One problem we run into is that the Sphinx test daemon might not always be running, so it would be nice to skip the search tests (or specs) if this is the case. For example, suppose that we have a Searches controller (whose index action will handle searches). Here is a skeleton for the Searches controller specs that runs only when Sphinx is running:

spec/controllers/searches_controller_spec.rb

# Return a list of system processes.
def processes
  process_cmd = case RUBY_PLATFORM
                when /djgpp|(cyg|ms|bcc)win|mingw/ then 'tasklist /v'
                when /solaris/                     then 'ps -ef'
                else
                  'ps aux'
                end
  `#{process_cmd}`
end

# Return true if the search daemon is running.
def testing_search?
  processes.include?('searchd')
end

describe SearchesController do
  .

  .
  .
end if testing_search?

(A blog post on testing with Ultrasphinx proved useful in this context.)

Writing the first tests

OK, now we’re ready to write some concrete tests. Some basic tests (using RSpec) might look like these:

spec/controllers/searches_controller_spec.rb

describe SearchesController do

  describe "Person searches" do

    it "should search by name" do
      get :index, :q => "quentin", :model => "Person"
      assigns(:results).should == [people(:quentin)].paginate
    end

    it "should search by description" do
      get :index, :q => "I'm Quentin", :model => "Person"
      assigns(:results).should == [people(:quentin)].paginate
    end
  end
end if testing_search?

Here we’ve passed a model parameter in anticipation of using a single action to search multiple models.

The specs fail, of course:

$ script/spec spec/controllers/searches_controller_spec.rb
2 examples, 2 failures

Apart from the if testing_search? clause, there’s nothing here beyond vanilla RSpec, so in what follows I won’t bother showing any more specs.

Person: Basic indexing

Now we’re ready for some basic searching. Suppose we have a Person model with name and description fields, which we want to enable for searching. We need the is_indexed method from Ultrasphinx:

app/models/person.rb

class Person < ActiveRecord::Base
  is_indexed :fields => [ 'name', 'description' ]
  .
  .
  .
end

Then a sample Searches controller index might look like this:

app/controllers/searches_controller.rb

def index
  query = params[:q].strip
  page  = params[:page] || 1
  model = params[:model]
  filters = {}
  @search = Ultrasphinx::Search.new(:query => query,
                                    :page => page,
                                    :class_names => model,
                                    :filters => filters)
  @search.run
  @results = @search.results
end

Note the use of a :page option; Ultrasphinx works with the will_paginate plugin out of the box.

A sample search box partial might look like this:

app/views/searches/_box.html.erb

<% form_tag searches_path, :method => :get do %>
  <fieldset>
    <%= text_field_tag :q, h(params[:q]), :maxlength => 50 %>
    <%= submit_tag "Search" %>
    <%= hidden_field_tag "model", search_model %>
  </fieldset>
<% end %>

where search_model is just a helper that inspects params and returns the name of the model being searched. (For example:

app/helpers/searches_helper.rb

module SearchesHelper

  # Return the model to be searched based on params.
  def search_model
    return "Person"    if params[:controller] =~ /home/
    return "ForumPost" if params[:controller] =~ /forums/
    params[:model] || params[:controller].classify
  end
end

where params[:controller].classify automagically returns the string "Person" inside the People controller and "Message" inside the Messages controller.)

As long as the test database contains the appropriate user (in our case, Quentin from restful_authentication), the specs should pass once we reindex:

$ rake ultrasphinx:bootstrap RAILS_ENV=test
$ script/spec spec/controllers/searches_controller_spec.rb
2 examples, 0 failures

If they fail, chances are that either (1) there’s some rogue development daemon running or (2) we forgot to reindex the test database after changing a model. If this happens, you can be extra paranoid by recycling everything:

$ rake ultrasphinx:daemon:stop
$ rake ultrasphinx:bootstrap RAILS_ENV=test

Message: Ultrasphinx with conditions and filtering

One common task is to put a condition on a search result. For example, suppose we have a Message model with a subject and content we want to index, but with “trashed” messages we want to exclude. Suppose further that recipients trash messages by setting a recipient_deleted_at attribute in the Message model. Untrashed messages would then have a NULL value for recipient_deleted_at:

app/models/message.rb

class Message < ActiveRecord::Base
  is_indexed :fields => [ 'subject', 'content', 'recipient_id' ],
             :conditions => "recipient_deleted_at IS NULL"
  .
  .
  .
end

Of course, when searching through messages for a particular person, we should only return messages actually sent to that person. This is why we added the recipient_id to the index fields above; this way, we can use an Ultrasphinx filter to restrict the results appropriately in the Searches controller:

app/controllers/searches_controller.rb

def index
  query = params[:q].strip
  page  = params[:page] || 1
  model = params[:model]
  filters = {}
  if model == "Message"
    # Restrict message results to those sent to the current person.
    filters['recipient_id'] = current_person.id
  end
  @search = Ultrasphinx::Search.new(:query => params[:q],
                                    :page => params[:page] || 1,
                                    :class_names => params[:model],
                                    :filters => filters)
  @search.run
  @results = @search.results
end

Of course, this requires an appropriately defined current_person object in line 8, which we assume is taken care of by the application’s authentication scheme.

ForumPost: Ultrasphinx with Single Table Inheritance (STI) and associations

Our final example combines conditions with an include. Insoshi has a ForumPost model that inherits from a Post base class (which is also used for blog posts) using Single Table Inheritance (STI). We want to restrict forum searches to the body of forum posts, excluding blog posts. We also want to include the topic name in searches, so that a post “Lorem ipsum” under topic “Foobar” will show up for both the queries “Lorem” and “Foobar”. We can achieve this by using a conditions clause on the STI type, while using an include for the topic association:

class ForumPost < Post
  is_indexed :fields => [ 'body' ],
             :conditions => "type = 'ForumPost'",
             :include => [{:association_name => 'topic', :field => 'name'}]
  belongs_to :topic
  .
  .
  .
end

(If we leave out the type condition, Ultrasphinx happily indexex all the blog posts as well. Rails then complains when trying to make a new ForumPost using a BlogPost id.)

With that, we’ve covered all our basic search needs. As noted above, there’s one more advanced technique being used at Insoshi (handling searches on boolean attributes such as deactivated), which I’ll probably cover in a later post. It’s also worth noting that, unlike Ferret, Sphinx doesn’t update the search index with every Active Record update; you need to update the index periodically with a cron job. Take a look at the Ultrasphinx deployment notes for more details.

TextMate Footnotes and Ultrasphinx

Finally, there’s a minor incompatibility between Ultrasphinx and the latest (Rails 2.1-compatible) TextMate Footnotes, which gives the following error (at least when using vendored Rails):

activesupport/lib/active_support/dependencies.rb:275:in `load_missing_constant':
uninitialized constant Footnotes::Filter (NameError)

This is because Ultrasphinx is looking for the Rails file initializer.rb, but instead it finds initializer.rb as defined by Footnotes. The fix is to change “initializer” to something else (say, “loader”) everywhere; see my fork of Footnotes at GitHub for an example.

2008-07-3

A Rails 2.1 case study: upgrading the Insoshi social networking platform

Filed under: Git, Insoshi, Ruby on Rails — mhartl @ 10:58

I’m happy to announce the release of a new edge branch of the Insoshi social networking platform, which is fully updated with Rails 2.1 support. (The Insoshi demo site is currently running off this edge branch.) The result is a good example of upgrading a real-life application to Rails 2.1, so I thought a blog post detailing the steps might be useful.

Most of the changes here are quite generic—updates to widely used plugins, for example—and are not specific to Insoshi. I’ve been especially careful to include error messages when applicable, so that search engines can index them; I’m sure I’m not the only programmer who follows the “Google the error message” algorithm when debugging. I’ve also included the Git commands I used, since I know a lot of Rails developers are working to learn Git and I thought some examples might be helpful.

It’s important to note at the outset that you do not need to follow these steps yourself to upgrade Insoshi. Insoshi contributors, and anyone else who wants the edge version of Insoshi, should follow the instructions at the Insoshi wiki (if they haven’t already) and then issue the following commands:

$ git fetch origin
$ git branch --track edge origin/edge
$ git checkout edge

This way, you’ll get all these changes for free. (Once hosting support for Rails 2.1 is more widespread (especially at Heroku), we’ll merge the Rails 2.1 Insoshi edge into the master branch.)  UPDATE: The merge has happened, and now the Insoshi master branch runs Rails 2.1.

And now, on with our show. Here’s what it took to get Insoshi running under Rails 2.1.

Update RubyGems

Upgrading Insoshi to Rails 2.1 involves updating a bunch of plugins, many of which are now hosted at GitHub. Unfortunately, older versions of RubyGems can’t install directly from GitHub sources; for example, trying to install will_paginate with RubyGems 1.1.0 gives you this error message:

$ gem --version
1.1.0
$ sudo gem install mislav-will_paginate -s http://gems.github.com/
ERROR:  could not find mislav-will_paginate locally or in a repository

The solution is to update the system gems (including RubyGems) as follows:

$ sudo gem update --system
$ gem --version
1.2.0

Install Rails 2.1

Installing Rails 2.1 itself is the easiest step in the entire upgrade. (I think when people wonder, “How hard could upgrading to Rails 2.1 possibly be?”, they have mainly this step in mind.)

$ sudo gem update rails

Things now get a little trickier, since we want to freeze Rails 2.1 in vendor/rails to follow the best practice for production Rails apps. The first part goes smoothly:

$ git rm -r vendor/rails
$ git commit -a -m "Cleared out vendor/rails"

There’s a hitch with the second step. If you happen to have only Rails 2.1 (but not 2.0.2) on your machine, there’s a bootstrapping problem due to a variable in environment.rb:

$ rake rails:freeze:gems
Missing the Rails 2.0.2 gem. Please `gem install -v=2.0.2 rails`, update your
RAILS_GEM_VERSION setting in config/environment.rb for the Rails version you
do have installed, or comment out RAILS_GEM_VERSION to use the latest version
installed.

The solution is to update environment.rb with the proper Rails gem version:

# Specifies gem version of Rails to use when vendor/rails is not present
RAILS_GEM_VERSION = '2.1.0' unless defined? RAILS_GEM_VERSION

Then the freeze works fine:

$ rake rails:freeze:gems
$ git add .
$ git commit -a -m "Updated vendor/rails to Rails 2.1"

Finally, following the advice froma Akita’s Rolling With Rails 2.1, we add a new file called config/initializers/new_defaults.rb:

# These settings change the behavior of Rails 2 apps and will be defaults
# for Rails 3. You can remove this initializer when Rails 3 is released.

# Only save the attributes that have changed since the record was loaded.
ActiveRecord::Base.partial_updates = true

# Include ActiveRecord class name as root for JSON serialized output.
ActiveRecord::Base.include_root_in_json = true

# Use ISO 8601 format for JSON serialized times and dates
ActiveSupport.use_standard_json_time_format = true

# Don't escape HTML entities in JSON, leave that for the #json_escape helper
# if you're including raw json in an HTML page.
ActiveSupport.escape_html_entities_in_json = false
$ git add config/initializers/new_defaults.rb
$ git commit -m "Added new defaults initializer"

Update RSpec for Rails 2.1

A necessary step in verifying that the Insoshi application is working under Rails 2.1 is to get the test suite to pass. Our tests are written using RSpec, but older versions of RSpec don’t work with Rails 2.1, so we can’t even run the test suite. D’oh!

Since the old plugins don’t work, first we remove them:

$ git rm -r vendor/plugins/rspec*
$ git commit -a -m "removed outdated RSpec plugins"

Then we need to install the most recent versions of the RSpec plugins from GitHub:

$ script/plugin install git://github.com/dchelimsky/rspec.git
$ script/plugin install git://github.com/dchelimsky/rspec-rails.git
$ script/generate rspec

In my case, I was careful not to overwrite spec_helper.rb when running script/generate rspec in the last step, as the current spec helper contains several custom modifications that I didn’t want to lose.

Then we need to add the changes:

$ git add .
$ git commit -a -m "Added latest RSpec plugins from GitHub"

By the way, running specs from the command line won’t work:

$ spec spec/models/person_spec.rb
Your RSpec on Rails plugin is incompatible with your installed RSpec.

RSpec          : 20080526202855
RSpec on Rails : 20080628203842

To fix this, you can use script/spec in place of spec, or you can upgrade to the latest RSpec gem using the RSpec source from GitHub:

$ cd ~/tmp
$ git clone git://github.com/dchelimsky/rspec.git
$ cd rspec
$ rake gem
$ sudo gem install pkg/rspec-1.4.1.gem

(All of Insoshi’s tests use RSpec, but if you use Test::Unit you should know that as of Rails 2.1 default tests can’t be run at the command line using the ruby executable; for tests generated by Rails 2.1, you now have to include the test directory explicitly using the -I test flag. For a hypothetical resource foobar, for example, you would get this:

$ script/generate scaffold foobar baz:string
$ ruby test/functional/foobars_controller_test.rb
test/functional/foobars_controller_test.rb:1:in `require':
no such file to load -- test_helper (LoadError)
        from test/functional/foobars_controller_test.rb:1

It passes if you tell ruby about the test directory:

$ ruby -I test test/functional/foobars_controller_test.rb
Loaded suite test/functional/foobars_controller_test
Started
.......
Finished in 0.24375 seconds.

7 tests, 13 assertions, 0 failures, 0 errors

(N.B. Using rake still works fine.) It’s unclear why the Rails Core team decided to make this change, but it’s a definite gotcha so I thought it deserved mention.)

With the new RSpec installed we can at least run the test suite, but unfortunately there are huge amounts of breakage. This is mainly due to several plugin incompatibilities and a slight Rails 2.1/Insoshi conflict (discussed below). Let’s get started fixing them.

Update obsolete helper specs

One source of breakage is the helper specs (in the spec/helpers/ directory). All these specs have obsolete code, resulting in errors such as

NoMethodError in 'TopicsHelper should include the TopicHelper'
undefined method `metaclass'

By going to a temp directory and running a sample rspec_scaffold with the updated RSpec to get a sample spec template file, you can discover that the line

included_modules = self.metaclass.send :included_modules

needs to be changed to

included_modules = (class << helper; self; end).send :included_modules

in each helper spec. You can use search-and-replace in your text editor to make all the changes, and then commit them:

$ git commit -a -m "Updated outdated spec helpers"

This won’t fix everything in the helper specs; there are still warning messages like this:

Modules will no longer be automatically included in RSpec version 1.1.4

This is because the ActivitiesHelper module is not explicitly included in the helper spec. The fix is to add the relevant include:

spec/helpers/activities_helper_spec.rb

require File.dirname(__FILE__) + '/../spec_helper'
include ActivitiesHelper
.
.
.

Update will_paginate for Rails 2.1

The old will_paginate plugin won’t work with Rails 2.1. You get tons of errors like

SystemStackError in 'PeopleController people pages should have a working show page'
stack level too deep

We need to remove the old plugin and install an update from GitHub. The current recommended method is to install it as a gem, and luckily Rails 2.1 has a slick new method for dealing with gem dependencies. In principle, we just need to add the following lines to environment.rb

Rails::Initializer.run do |config|
  .
  .
  .
  # Custom gem requirements
  config.gem 'mislav-will_paginate', :version => '~> 2.3.2',
                                     :lib => 'will_paginate',
                                     :source => 'http://gems.github.com'
end

This tells Rails that our application requires will_paginate version 2.3.2 or later, and that it can be found at GitHub. We can do the installation like this:

$ sudo rake gems:install

Unfortunately, in our case this won’t work, since will_paginate doesn’t work as a gem if the application has Rails in vendor/rails. (This gotcha is buried in the will_paginate wiki at GitHub.) We have to install will_paginate as a plugin after all:

$ git rm -r vendor/plugins/will_paginate
$ script/plugin install git://github.com/mislav/will_paginate.git
$ git add .
$ git commit -a -m "Updated will_paginate plugin"

(Unfortunately, this still didn’t fix the problem for Insoshi, because there were extra files in the lib/ directory:

$ git rm -r lib/will_paginate*
$ git commit -a -m "Removed will_paginate from lib"

I’m not sure how that happened, but it sure took a while to figure out…)

Update TextMate footnotes for Rails 2.1

Now that will_paginate is fixed (and the corresponding specs pass), you’d think the relevant pages would work. You’d be wrong. We were using the edge version of TextMate foototes, and it turns out that the footnotes-edge plugin breaks horribly in Rails 2.1. Basically, every page gives you something like this in the browser:

ActionController::RenderError in HomeController#index

You called render with invalid options : {:layout=>false, :action=>"index"}, nil

To fix this, we need to install an update from GitHub (are you seeing a pattern here?):

$ git rm -r vendor/plugins/footnotes-edge
$ script/plugin install git://github.com/drnic/rails-footnotes.git
$ mv vendor/plugins/rails-footnotes vendor/plugins/footnotes
$ git add .
$ git commit -a -m "Updated TextMate footnotes"

Update attachment_fu for Rails 2.1

We’re almost there. A few photo specs still fail, with messages like

NoMethodError in 'PhotosController when logged in should create photo'
undefined method `callbacks_for' for #<Photo:0x529a61c>

The solution is to update attachment_fu:

$ git rm -r vendor/plugins/attachment_fu/
$ script/plugin install http://svn.techno-weenie.net/projects/plugins/attachment_fu/
$ git add .
$ git commit -a -m "Updated attachment_fu"

Fix the broken verify action

This would seem to complete the update, but unfortunately this isn’t the end; there’s one more (Insoshi-specific) problem. Insoshi includes the option to verify the email addresses of new members, using a custom action called verify inside the People controller. Unfortunately, the specs that test the email verification fail with the error

No action responded to verify

This looked to be a general problem with custom actions in Rails 2.1 tests, but it turns out that the culprit is the word verify. For some reason, an action called verify causes problems in Rails 2.1, but only in tests. (This took a long time to figure out.) The (rather inelegant) fix is to rename the controller action to verify_email, and then add a line in the routes file so that old email verification links still work:

app/controllers/people_controller.rb

def verify_email
  .
  .
  .
end

config/routes.rb

map.resources :people, :member => { :verify_email => :get,
                                    :common_contacts => :get }
map.connect '/people/verify/:id', :controller => 'people',
                                  :action => 'verify_email'

Of course, we also need to change get :verify to get :verify_email in the spec file (spec/controllers/people_controller_spec.rb). Once that is done, all the specs pass, and the upgrade is complete:

$ rake spec
..............................................................................
..............................................................................
..............................................................................
..............................................................................
..............................

Finished in 12.337675 seconds

342 examples, 0 failures

Phew!

Postscript: Rails 2.1 migrations, schema_info, and schema_migration

One benefit of moving to Rails 2.1 is the new method for handling migrations. We’ve already run into instances with the Insoshi project where we needed to merge updates with conflicting migration numbers, and it’s really not any fun with Rails 2.0. There is a potential gotcha, though, since in order to perform the new migration cleverness Rails 2.1 uses a table called schema_migration in place of the old schema_info table. The change is supposed to happen automatically when you first migrate after installing Rails 2.1, but we ran into some difficulties…

When making a Rails 2.0 to 2.1 upgrade, you shouldn’t run into any problems if you start from a clean database and run

$ rake db:migrate

If you have an existing database (for instance, our demo site database), the migration should automatically convert the schema_info table used in 2.0 (which stores only a single integer value) to schema_migration (which has entries for all the migrations that have been performed).

While that worked for us in development, we ran into an issue on our staging server (where we test everything before installing it on our production servers): the expected table conversion didn’t happen. Instead, the migration tried to recreate the tables from scratch. We got around this by bootstrapping the conversion before the migration, as follows.

  1. Create the schema_migrations table via SQL:
    $ mysql insoshi_production -u root
    mysql> CREATE TABLE schema_migrations (
           version VARCHAR(255) NOT NULL,
           UNIQUE KEY unique_schema_migrations (version)
           );
    
  2. Find out the current migration version number:
    mysql> SELECT version FROM schema_info;
    +---------+
    | version |
    +---------+
    |      26 |
    +---------+
    1 row in set (0.07 sec)
    
  3. Insert values from 1 to the current version as strings (i.e., as a VARCHAR array):
    mysql> INSERT INTO schema_migrations VALUES ('1'),('2'),('3'),('4'),('5'),
    ('6'),('7'),('8'),('9'),('10'),('11'),('12'),('13'),('14'),('15'),
    ('16'),('17'),('18'),('19'),('20'),('21'),('22'),('23'),('24'),
    ('25'),('26');
    

Of course, you shouldn’t have to do this—things should Just Work™—but then again, upgrading to Rails 2.1 wasn’t supposed to be difficult, either. :-)

2008-07-1

Using Git to pull in a patch from a single commit

Filed under: Git, Insoshi, Ruby on Rails — mhartl @ 13:06

Git is awesome at merging and branching, but what if you want to pull in just one patch from a single commit?

We ran into this recently with Insoshi at GitHub, where piotrj updated the README to be in RDoc format. Why not just merge in his changes?  Well, Piotr has also been working on image galleries, but in the mean time billsaysthis has picked up that torch and run with it.  As a result, Piotr’s image gallery changes would cause conflicts with the current master branch, and in any case we don’t want those changes just yet—we only want the RDoc-ified README for now. If only there were a way to use Git to cherry-pick just one commit…

Aha, git cherry-pick to the rescue!  Here are the steps for my particular case:

  1. I use
     $ git fetch piotrj

    to fetch Piotr’s changes to my local machine. (I had already connected to his GitHub fork using the steps from the relevant Insoshi Git guide.)

  2. Looking at GitHub for the commit label, I see that it’s 3b4257f0454fc31349a0505c9a883f691fe8889d. (I could also checkout Piotr’s branch locally and use git log to see the commits.) So all I need to do is switch to the master branch and cherry-pick the change:
    $ git checkout master
    $ git cherry-pick 3b4257f0454fc31349a0505c9a883f691fe8889d

    Note that I don’t have to reference Piotr’s branch explicitly; Git figures out the right branch to use from the commit label.

That’s it! Amazingly, I don’t even have to do a commit; Git adds Piotr’s message to my log automatically. After pushing the updated master branch to GitHub, the Insoshi README is noticeably improved.  Thanks to piotrj—and to git cherry-pick!

Older Posts »

Theme: Customized Shocking Blue Green. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.