blog.mhartl | Michael Hartl's tech blog

2008-07-17

Searching a Ruby on Rails application with Sphinx and Ultrasphinx

Filed under: Ferret, Insoshi, Ruby on Rails, Sphinx, Ultrasphinx — mhartl @ 16:46

We recently switched the Insoshi social networking platform from a Ferret search engine to Sphinx (and Ultrasphinx), due to the well-known problems encountered with Ferret and due to our own experience of its instability on the Insoshi developer site. (Sphinx is currently running on our demo site, and anyone who wants the Sphinx-enabled source can grab edge Insoshi as described in the Rails 2.1 upgrade post. We’ll merge it into the master branch within a couple weeks.)

The switch did not always go smoothly, and there are several gotchas that I thought might be helpful to discuss in case other people run into them. I’ve also included some material on using Ultrasphinx, since its documentation is a bit sparse. For pedagogical purposes, I’ve simplified the Insoshi source slightly for this discussion; you don’t have to be familiar with the Insoshi codebase to follow this post. (N.B. The actual production code contains a trick for dealing with more advanced filtering requirements, which will probably be the subject of a future post.)

Installing Sphinx

The first step, naturally enough, is to install Sphinx. You can get the latest and greatest version at the Sphinx download page. (This blog post uses version 0.9.8, which was released just a couple of days before this post was written.) Download the source, and then install it as follows:

$ tar zxf sphinx-0.9.8.tar.gz
$ cd  sphinx-0.9.8
$ ./configure --with-pgsql
$ make
$ sudo make install

The configure step ensures that Sphinx gets compiled with PostgreSQL support (MySQL comes for free). We’ve had trouble getting all the Postgres stuff to work properly, but it doesn’t hurt to have it. If you’d rather omit the Postgres support, just use ./configure by itself.

Installing Ultrasphinx

The second step is to install the Ultrasphinx plugin, which has one gem dependency:

$ sudo gem install chronic

The installation itself is trickier than it sounds; although there are plenty of tutorials that tell you how to do it, as far as I can tell they don’t work. I tried a couple of different tacks, both of which failed. First, I tried

$ svn export svn://rubyforge.org/var/svn/fauna/ultrasphinx/trunk vendor/plugins/ultrasphinx
Export complete.

The only problem is, this didn’t do anything; there was literally no change to my working copy. I then tried a plugin install:

$ script/plugin install svn://rubyforge.org/var/svn/fauna/ultrasphinx/trunk
Export complete.

Still nothing. After some time flailing about, I finally found a James on Software Sphinx/Ultrasphinx post, which suggested cloning his GitHub fork of Ultrasphinx. That worked at first, but later on I encountered a clash with the latest version of will_paginate:

WillPaginate: You are using a paginated collection of class
Ultrasphinx::Search which conforms to the old API of WillPaginate::Collection
by using `page_count`, while the current method name is `total_pages`. Please
upgrade yours or 3rd-party code that provides the paginated collection.

Luckily, with some judicious Googling I was able to find a second repository at GitHub, whose most recent commit as of this writing is updating the code to work with the latest will_paginate, which certainly looked promising. And, indeed, it worked beautifully, so I’m happy to recommend it:

$ git clone git://github.com/DrMark/ultrasphinx.git vendor/plugins/ultrasphinx
$ rm -rf vendor/plugins/ultrasphinx/.git

(This is one of the many reasons GitHub rocks; if the “official” version of a plugin is unavailable or out of date, you still might be able to find an updated fork on GitHub.)

Configuring Ultrasphinx

To configure Ultrasphinx, I followed the config instructions at the main Ultrasphinx site:

Next, copy the examples/default.base file to RAILS_ROOT/config/ultrasphinx/default.base.
This file sets up the Sphinx daemon options such as port, host, and index location.

Since many of the Insoshi fields allow HTML, the search results are better if we strip HTML tags first:

config/ultrasphinx/default.base

index
{
  .
  .
  .
  # HTML-specific options
  html_strip = 1
}

N.B. This is a replacement for the older strip_html syntax, used inside the source section:

config/ultrasphinx/default.base

source
{
  # Individual SQL source options
  sql_ranged_throttle = 0
  sql_range_step = 5000
  sql_query_post =
  strip_html = 1
}

If you get a warning like

WARNING: key 'strip_html' is deprecated in config/ultrasphinx/development.conf line 24;
use 'html_strip (per-index)' instead.

just remove the strip_html line and put an html_strip line in its place (taking care to put it in the index section of the configuration file).

Bootstrapping Ultrasphinx

Now we’re ready to fire up Ultrasphinx, which uses Sphinx to build up a search index of our database:

$ rake ultrasphinx:bootstrap

There’s just one hitch: many people (including me) get an error at this stage:

dyld: Library not loaded: /usr/local/mysql/lib/mysql/libmysqlclient.15.dylib
  Referenced from: /usr/local/bin/indexer
  Reason: image not found

I found a solution using the canonical “Google the error message” method. There’s something screwy with the location of the MySQL libraries, but it’s nothing a little symlink couldn’t fix:

$ sudo ln -s /usr/local/mysql/lib /usr/local/mysql/lib/mysql

Testing Sphinx and Ultrasphinx

In principle, things are working now under the hood; we just need to add in some code to our models and controllers to execute the searches. I prefer test-driven development, though, so the next priority is to get Sphinx and Ultrasphinx working in a test environment.

It’s important to stop the Ultrasphinx daemon, which might be running in development mode if you used rake ultrasphinx:bootstrap above:

$ rake ultrasphinx:daemon:stop

Then make a test-specific configuration file:

config/ultrasphinx/test.base

{
  # Individual SQL source options
  sql_ranged_throttle = 0
  sql_range_step = 999999999
  sql_query_post =
}
.
.
.
index
{
  .
  .
  .
  # HTML-specific options
  html_strip = 1
}

The line sql_range_step = 999999999 here is key. The sql_range_step variable controls how much Ultrasphinx increases the ids of the rows as it indexes; by default, it’s 5000, but Insoshi uses foxy fixtures, which often create objects with huge ids. As a result, the indexing step can take a long time (several minutes), even for a tiny test database. Setting sql_range_step to a larger step size solves the problem.

With that done, we’re ready to fire things up:

$ rake ultrasphinx:bootstrap RAILS_ENV=test

One problem we run into is that the Sphinx test daemon might not always be running, so it would be nice to skip the search tests (or specs) if this is the case. For example, suppose that we have a Searches controller (whose index action will handle searches). Here is a skeleton for the Searches controller specs that runs only when Sphinx is running:

spec/controllers/searches_controller_spec.rb

# Return a list of system processes.
def processes
  process_cmd = case RUBY_PLATFORM
                when /djgpp|(cyg|ms|bcc)win|mingw/ then 'tasklist /v'
                when /solaris/                     then 'ps -ef'
                else
                  'ps aux'
                end
  `#{process_cmd}`
end

# Return true if the search daemon is running.
def testing_search?
  processes.include?('searchd')
end

describe SearchesController do
  .

  .
  .
end if testing_search?

(A blog post on testing with Ultrasphinx proved useful in this context.)

Writing the first tests

OK, now we’re ready to write some concrete tests. Some basic tests (using RSpec) might look like these:

spec/controllers/searches_controller_spec.rb

describe SearchesController do

  describe "Person searches" do

    it "should search by name" do
      get :index, :q => "quentin", :model => "Person"
      assigns(:results).should == [people(:quentin)].paginate
    end

    it "should search by description" do
      get :index, :q => "I'm Quentin", :model => "Person"
      assigns(:results).should == [people(:quentin)].paginate
    end
  end
end if testing_search?

Here we’ve passed a model parameter in anticipation of using a single action to search multiple models.

The specs fail, of course:

$ script/spec spec/controllers/searches_controller_spec.rb
2 examples, 2 failures

Apart from the if testing_search? clause, there’s nothing here beyond vanilla RSpec, so in what follows I won’t bother showing any more specs.

Person: Basic indexing

Now we’re ready for some basic searching. Suppose we have a Person model with name and description fields, which we want to enable for searching. We need the is_indexed method from Ultrasphinx:

app/models/person.rb

class Person < ActiveRecord::Base
  is_indexed :fields => [ 'name', 'description' ]
  .
  .
  .
end

Then a sample Searches controller index might look like this:

app/controllers/searches_controller.rb

def index
  query = params[:q].strip
  page  = params[:page] || 1
  model = params[:model]
  filters = {}
  @search = Ultrasphinx::Search.new(:query => query,
                                    :page => page,
                                    :class_names => model,
                                    :filters => filters)
  @search.run
  @results = @search.results
end

Note the use of a :page option; Ultrasphinx works with the will_paginate plugin out of the box.

A sample search box partial might look like this:

app/views/searches/_box.html.erb

<% form_tag searches_path, :method => :get do %>
  <fieldset>
    <%= text_field_tag :q, h(params[:q]), :maxlength => 50 %>
    <%= submit_tag "Search" %>
    <%= hidden_field_tag "model", search_model %>
  </fieldset>
<% end %>

where search_model is just a helper that inspects params and returns the name of the model being searched. (For example:

app/helpers/searches_helper.rb

module SearchesHelper

  # Return the model to be searched based on params.
  def search_model
    return "Person"    if params[:controller] =~ /home/
    return "ForumPost" if params[:controller] =~ /forums/
    params[:model] || params[:controller].classify
  end
end

where params[:controller].classify automagically returns the string "Person" inside the People controller and "Message" inside the Messages controller.)

As long as the test database contains the appropriate user (in our case, Quentin from restful_authentication), the specs should pass once we reindex:

$ rake ultrasphinx:bootstrap RAILS_ENV=test
$ script/spec spec/controllers/searches_controller_spec.rb
2 examples, 0 failures

If they fail, chances are that either (1) there’s some rogue development daemon running or (2) we forgot to reindex the test database after changing a model. If this happens, you can be extra paranoid by recycling everything:

$ rake ultrasphinx:daemon:stop
$ rake ultrasphinx:bootstrap RAILS_ENV=test

Message: Ultrasphinx with conditions and filtering

One common task is to put a condition on a search result. For example, suppose we have a Message model with a subject and content we want to index, but with “trashed” messages we want to exclude. Suppose further that recipients trash messages by setting a recipient_deleted_at attribute in the Message model. Untrashed messages would then have a NULL value for recipient_deleted_at:

app/models/message.rb

class Message < ActiveRecord::Base
  is_indexed :fields => [ 'subject', 'content', 'recipient_id' ],
             :conditions => "recipient_deleted_at IS NULL"
  .
  .
  .
end

Of course, when searching through messages for a particular person, we should only return messages actually sent to that person. This is why we added the recipient_id to the index fields above; this way, we can use an Ultrasphinx filter to restrict the results appropriately in the Searches controller:

app/controllers/searches_controller.rb

def index
  query = params[:q].strip
  page  = params[:page] || 1
  model = params[:model]
  filters = {}
  if model == "Message"
    # Restrict message results to those sent to the current person.
    filters['recipient_id'] = current_person.id
  end
  @search = Ultrasphinx::Search.new(:query => params[:q],
                                    :page => params[:page] || 1,
                                    :class_names => params[:model],
                                    :filters => filters)
  @search.run
  @results = @search.results
end

Of course, this requires an appropriately defined current_person object in line 8, which we assume is taken care of by the application’s authentication scheme.

ForumPost: Ultrasphinx with Single Table Inheritance (STI) and associations

Our final example combines conditions with an include. Insoshi has a ForumPost model that inherits from a Post base class (which is also used for blog posts) using Single Table Inheritance (STI). We want to restrict forum searches to the body of forum posts, excluding blog posts. We also want to include the topic name in searches, so that a post “Lorem ipsum” under topic “Foobar” will show up for both the queries “Lorem” and “Foobar”. We can achieve this by using a conditions clause on the STI type, while using an include for the topic association:

class ForumPost < Post
  is_indexed :fields => [ 'body' ],
             :conditions => "type = 'ForumPost'",
             :include => [{:association_name => 'topic', :field => 'name'}]
  belongs_to :topic
  .
  .
  .
end

(If we leave out the type condition, Ultrasphinx happily indexex all the blog posts as well. Rails then complains when trying to make a new ForumPost using a BlogPost id.)

With that, we’ve covered all our basic search needs. As noted above, there’s one more advanced technique being used at Insoshi (handling searches on boolean attributes such as deactivated), which I’ll probably cover in a later post. It’s also worth noting that, unlike Ferret, Sphinx doesn’t update the search index with every Active Record update; you need to update the index periodically with a cron job. Take a look at the Ultrasphinx deployment notes for more details.

TextMate Footnotes and Ultrasphinx

Finally, there’s a minor incompatibility between Ultrasphinx and the latest (Rails 2.1-compatible) TextMate Footnotes, which gives the following error (at least when using vendored Rails):

activesupport/lib/active_support/dependencies.rb:275:in `load_missing_constant':
uninitialized constant Footnotes::Filter (NameError)

This is because Ultrasphinx is looking for the Rails file initializer.rb, but instead it finds initializer.rb as defined by Footnotes. The fix is to change “initializer” to something else (say, “loader”) everywhere; see my fork of Footnotes at GitHub for an example.

About these ads

29 Comments

  1. Thanks for the write-up. Very helpful.

    I’m currently using Sphinx and Thinking_Sphinx in one of my apps and it’s running great.

    Comment by MikeInAZ — 2008-07-17 @ 17:15

  2. @MikeInAZ: Glad it helped! I looked at thinking_sphinx, but only briefly; if you know of any big advantages over Ultrasphinx, please let me know.

    Comment by mhartl — 2008-07-17 @ 18:17

  3. For me, it was easier to understand and setup.

    This blog post goes more in-depth:

    http://reinh.com/blog/2008/07/14/a-thinking-mans-sphinx.html

    Comment by MikeInAZ — 2008-07-17 @ 19:30

  4. Nice post, Mike.

    Busy week for searching in Rails. In addition to Rein’s post, I posted earlier in the week about us implementing UltraSphinx on MindBites on Thursday and then replacing the entire thing with Xapian and deploying that following Monday.

    http://locomotivation.com/2008/07/15/mulling-over-our-ruby-on-rails-full-text-search-options

    I’m still working on a post describing our Xapian thoughts but so far so good. Much simpler than UltraSphinx for us. We never did look at ThinkingSphinx.

    Comment by Jim — 2008-07-17 @ 21:11

  5. One thing that’s still not clear to me is how you get ultrasphinx working simultaneously in test and development. (assuming you want to do autotest-style BDD).

    Do you start two daemons, one for each environment? Does that require them to be manually set for different ports? Or can one searchd instance handle both? What command-line commands do you execute to make this happen?

    Comment by Evan Dorn — 2008-07-22 @ 13:06

  6. @Evan: As far as I know, there’s no way to run development and test daemons simultaneously. Yes, that kinda bites. Let me know if you find a workaround.

    Comment by mhartl — 2008-07-22 @ 13:32

    • It is easy to run development/test/production simultaneously. Replace default.base with three .base configuration files (production.base, development.base, and test.base). Then in each base configuration change the path and searchd port. Now when you configure and start a server, it will generate different .conf files for each environment that can be run simultaneously.

      The configurations are not as dry now….but you gotta develop and test on the same box!

      Comment by Rama McIntosh — 2010-04-5 @ 01:52

  7. @mhartl: this is an interesting, if S L O W solution to the problem:

    http://tadatoshi.blogspot.com/2008/05/ultrasphinx-setup-part2.html

    Of course, it’s hard for me to test any of these approaches, since I’m having trouble getting sphinx to work at all (as per the email I sent you a while back).

    Comment by idahoev — 2008-07-22 @ 14:59

  8. @Evan: You’ll need to run an instance of the searchd daemon for each environment, on different ports. Personally I just stub the search calls though, saves having to worry about the test instance.

    Comment by Pat Allan — 2008-07-23 @ 18:53

  9. I just put up a blog post detailing the fix to two-daemon problem, along with two other big issues I faced while implementing the insoshi sphinx upgrade in an Insoshi fork I’m working on.

    There’s a lot of good stuff in there that I’m hoping will help insoshi upgraders and other ultrasphinx users.

    Comment by Evan Dorn — 2008-07-24 @ 10:46

  10. Thanks for the writeup! Very helpful and practical! – Cheri

    Comment by Cheri — 2008-07-24 @ 15:00

  11. […] with this post if you’re just starting out with sphinx. Instead, go read this much better introductory tutorial from the guys over at Insoshi. Then if you have problems, come back here and you may find […]

    Pingback by LRBlog » Blog Archive » Fixing problems with sphinx search — 2008-07-25 @ 18:45

  12. @Cheri: You’re welcome!

    Comment by mhartl — 2008-07-27 @ 17:28

  13. Thanks a lot! However, I am still struggling with the following error after trying to run rake ultrasphinx:index:

    FATAL: no sources found in config file.

    Any ideas? I followed all the instrutions upto that point. (ultrasphinx:configure gets me this: Rebuilding configurations for development environment
    Available models are
    Generating SQL

    Thanks again!
    Justus

    Comment by Justus — 2008-07-30 @ 01:09

  14. […] fix is detailed at the bottom of Searching a Ruby on Rails application with Sphinx and Ultrasphinx with the specific implementation details available via this github […]

    Pingback by Binary Code » Collision Between Ultrasphinx and TextMate Footnotes — 2008-08-7 @ 11:18

  15. […] installing Ultraspinx (perhaps per these instructions from Inoshi, which are the best I’ve found thus far) and you run into this error when time […]

    Pingback by Binary Code » Ultrasphinx Bootstrap Error — 2008-08-7 @ 14:29

  16. I got the search to work, however will_paginate does NOT work. I used the example above for my website, but it always complains about total_pages not found. page_count is also undefined method. What’s wrong?

    Comment by Jens — 2008-09-9 @ 02:46

  17. Hi,

    I am rails beginner. I have installed ultrasphinx plugin. I am geting error
    “Errno::ECONNABORTED in SearchController#index ”
    “An established connection was aborted by the software in your host machine. – recvfrom(2)”

    and in my application mysql is running on port – 3306. what change i have to make. Can anyone please help me in this.

    Comment by vinay — 2008-09-12 @ 05:39

  18. Great work.

    Comment by Zita — 2008-10-27 @ 12:00

  19. […] the way Sphinx indexing works, foxy fixtures will often slow down the indexing process drastically. This article explains how to overcome this limitation. Related Posts Fix for fixture_replacement2 when using […]

    Pingback by Pelargir - Musings on software and life from Matthew Bass. » Fast Sphinx indexing with foxy fixtures — 2008-11-19 @ 08:24

  20. Evan’s original code has piece like if ENV[‘USER’] == ‘eweaver’… It also has some obvious bugs like having ‘//’ in the index file paths instead of putting them let’s say in a ‘data’ subfolder. It doesn’t seem to be an active project. GitHub forking is nice, but can you rely on a fork that’s not widely adopted and tested?

    Comment by Nikolay Kolev — 2008-11-25 @ 10:36

  21. At the time I wrote the post, Ultrasphinx was both active and practically the default choice for Sphinx in Rails. The Rails world moves (absurdly) fast, though, and since then things have shifted in the direction of Thinking Sphinx. We just haven’t yet felt enough pain with our current solution to justify making a switch.

    Comment by Michael Hartl — 2008-11-25 @ 13:11

  22. Can you help me? I use ultrasphix too. And i don`t want use pagination for search results. But defoult value of per_page is 20. Has ultrasphinx additional parametr for show all results?

    Comment by Vitaly — 2009-01-9 @ 05:08

  23. […] Searching a Ruby on Rails application with Sphinx and Ultrasphinx « blog.mhartl | Michael Hartl’s… (tags: rails sphinx search reference) […]

    Pingback by links for 2009-06-17 « Amy G. Dala — 2009-06-17 @ 07:04

  24. […] I searched around and finally encountered this solution from Michael Hartl: […]

    Pingback by Mindtonic » Ultrasphinx MySql Location Error — 2009-06-24 @ 11:44

  25. great post. Quick question, but does anyone who uses ultrasphinx have problems with partial names not showing? For instance, if i search for CORN, i’ll get nothing, but if i search for CORNHUSKERS my results appear?

    I want to be able for a user to just type corn and they get all available permutations of the word corn.

    i tried setting enable_star = 1 and min_prefix_len = 4 in the default.base file. I then ran rake ultrasphinx:configure again. reloaded the index and restarted teh daemon, but i still don’t get that functionality?

    can anyone confirm that i am doing it wrong, i hope so! anyone have a solution?

    Comment by pjammer — 2009-08-19 @ 14:14

    • i’m having the same problem re: partials and i can’t figure it out either. so far, my searches are only working with exact searches.

      Comment by JR — 2009-10-15 @ 18:03

  26. I have tried to install sphinx many time but it didn’t work but now i think i can install it from the help of your post thanks.

    Comment by Hire developer — 2009-09-20 @ 23:41

  27. That was a really good article, thanks for taking the time to put it together! I’ve been using Sphinx in Rails for a while now and I haven’t been more satisfied.

    Comment by Filters for fridges — 2011-11-6 @ 02:16


RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

The Shocking Blue Green Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 73 other followers

%d bloggers like this: