Working with Sphinx fulltext search engine using ultrasphinx

EventsInIndia.com currently used ferret as the text search engine. There is one big problem I am running into with Ferret. If for some reason index is corrupted then not only search functionality goes down but the whole app is non-functional. That’s very intrusive.

I have switched from using ferret to sphinx is another full text search engine but it’s superfast when it comes to indexing the documents. It directly talks to MySql and other database.

I am using ultrasphinx as the plugin to make it easier to work with sphinx.

In this article I am going to discuss installation of sphinx, using ultrasphinx, filtering, sorting, pagination and some other tips.

Installation

Installation process is very straight forward.

 
sudo gem install chronic

wget http://www.sphinxsearch.com/downloads/sphinx-0.9.8-rc2.tar.gz
tar xvf sphinx-0.9.8-rc2.tar.gz
cd sphinx-0.9.8-rc2
./configure && make && sudo make install

script/plugin install svn://rubyforge.org/var/svn/fauna/ultrasphinx/trunk

Creating a test case

Searching and sorting goes hand in hand. Many times sorting needs to be done on fields like created_on which is stored in MySQL as datetime. That is a problem. In order to help sphix do the sorting on datetime field another column should be created and the date should be stored as an integer in YYYYMMDD format.

We will create a model and we will discuss the case.

 
ruby script/generate model Mysphinx name:string body:text start_date:integer city_id:integer

# In the migration we will create three records
    Mysphinx.create(:name => 'neeraj', :body => 'about me', 
                    :start_date => 20080401 , :city_id => 2)
    Mysphinx.create(:name => 'ror', :body => 'about ruby on rails', 
                    :start_date => 20090401 , :city_id => 20)    
    Mysphinx.create(:name => 'java', :body => 'java and neeraj are clumsy', 
                    :start_date => 20090402 , :city_id => 21)

configuration files and deamons

 
#create the directory
mkdir config/ultrasphinx

 # copy the base configuration file
cp vendor/plugins/ultrasphinx/examples/default.base  Ö“config/ultrasphinx/default.base
# At the top of the staging.base change the path to indicate the path to sphinx. 
# Mine looks like this
<% path = "/home/neeraj/sources/sphinx-0.9.8-rc2/" %>

# I am going to execute everything in "staging" environment. So I need staging.base file
cd config/ultrasphinx
cp default.base staging.base

# add filelds to the model
class Mysphinx < ActiveRecord::Base

  is_indexed :fields =>[
  {:field => :name},
  {:field => :body},
  {:field => :start_date, :sortable => true},
  {:field => :city_id}
  ]

end

#configure the file
rake ultrasphinx:configure RAILS_ENV=staging

#index the documents
rake ultrasphinx:index RAILS_ENV=staging

#start the daemon
rake ultrasphinx:daemon:start RAILS_ENV=staging

make sure that the daemon is started when mongrel is already up and running. I had put sphins daemon as the first process to run after stopping all processes and sphinx was failing. So I changed the order and started mongrel first and then I started sphinx daemon.

Searching

Let’s get started. Let’s do a simple search.

 
ruby script/console staging
>> sphinx_options = {:query => ''}
>> Ultrasphinx::Search.new(sphinx_options).run.size 
=> 20

I had 20 records and I got all the records. It means sphinx and ultrasphinx are playing well. If you run into any issue at this stage then go back and try to resolve it.

Filtering

Let’s say that I want to filter search result by start date range and we are searching for the keyword ‘music’.

 
filter_options = {:filters => {:start_date => Range.new(20080101,20081231}}
sphinx_options = {:query => 'music'}
sphinx_options.merge!(filter_options)

>> Ultrasphinx::Search.new(sphinx_options).run.size

Sorting

Let’s get the search result sorted by start_date in ascending order.

 
sort_options = :sort_by => 'start_date_sphinx', :sort_mode => 'ascending'
sphinx_options.merge!(filter_options)

>> Ultrasphinx::Search.new(sphinx_options).run.size

Pagination

Pagination is important and it is built right into ultrasphinx.

 
sphinx_options.merge!(:per_page => '20', :page => params['page'])

#process in controller
@search = Ultrasphinx::Search.new(sphinx_options)
@search.run
@mysphinxs = @search.results      

# in the view
<%= will_paginate @search %>

Watch out for

If you open development.base ( in this case staging.base) file you will notice the port number as 3313. If you decide to go to production then ensure that in production.base you need to change the port number at two places:under searchd and under client section.

Make sure to add a cron job to periodically index your documents. More instructions on setting up cron job is available here

My name is Neeraj. You can contact me at neerajdotname@gmail.com . I am also on twitter , facebook and Linkedin .

Checkout my github account for admin_data and other projects.