Working with Sphinx fulltext search engine using ultrasphinx
EventsInIndia.com currently used ferret as the text search engine. There is one big problem I am running into with Ferret. If for some reason index is corrupted then not only search functionality goes down but the whole app is non-functional. That’s very intrusive.
I have switched from using ferret to sphinx is another full text search engine but it’s superfast when it comes to indexing the documents. It directly talks to MySql and other database.
I am using ultrasphinx as the plugin to make it easier to work with sphinx.
In this article I am going to discuss installation of sphinx, using ultrasphinx, filtering, sorting, pagination and some other tips.
Installation
Installation process is very straight forward.
sudo gem install chronic wget http://www.sphinxsearch.com/downloads/sphinx-0.9.8-rc2.tar.gz tar xvf sphinx-0.9.8-rc2.tar.gz cd sphinx-0.9.8-rc2 ./configure && make && sudo make install script/plugin install svn://rubyforge.org/var/svn/fauna/ultrasphinx/trunk
Creating a test case
Searching and sorting goes hand in hand. Many times sorting needs to be done on fields like created_on which is stored in MySQL as datetime. That is a problem. In order to help sphix do the sorting on datetime field another column should be created and the date should be stored as an integer in YYYYMMDD format.
We will create a model and we will discuss the case.
ruby script/generate model Mysphinx name:string body:text start_date:integer city_id:integer
# In the migration we will create three records
Mysphinx.create(:name => 'neeraj', :body => 'about me',
:start_date => 20080401 , :city_id => 2)
Mysphinx.create(:name => 'ror', :body => 'about ruby on rails',
:start_date => 20090401 , :city_id => 20)
Mysphinx.create(:name => 'java', :body => 'java and neeraj are clumsy',
:start_date => 20090402 , :city_id => 21)
configuration files and deamons
#create the directory
mkdir config/ultrasphinx
# copy the base configuration file
cp vendor/plugins/ultrasphinx/examples/default.base Ö“config/ultrasphinx/default.base
# At the top of the staging.base change the path to indicate the path to sphinx.
# Mine looks like this
<% path = "/home/neeraj/sources/sphinx-0.9.8-rc2/" %>
# I am going to execute everything in "staging" environment. So I need staging.base file
cd config/ultrasphinx
cp default.base staging.base
# add filelds to the model
class Mysphinx < ActiveRecord::Base
is_indexed :fields =>[
{:field => :name},
{:field => :body},
{:field => :start_date, :sortable => true},
{:field => :city_id}
]
end
#configure the file
rake ultrasphinx:configure RAILS_ENV=staging
#index the documents
rake ultrasphinx:index RAILS_ENV=staging
#start the daemon
rake ultrasphinx:daemon:start RAILS_ENV=staging
make sure that the daemon is started when mongrel is already up and running. I had put sphins daemon as the first process to run after stopping all processes and sphinx was failing. So I changed the order and started mongrel first and then I started sphinx daemon.
Searching
Let’s get started. Let’s do a simple search.
ruby script/console staging
>> sphinx_options = {:query => ''}
>> Ultrasphinx::Search.new(sphinx_options).run.size
=> 20
I had 20 records and I got all the records. It means sphinx and ultrasphinx are playing well. If you run into any issue at this stage then go back and try to resolve it.
Filtering
Let’s say that I want to filter search result by start date range and we are searching for the keyword ‘music’.
filter_options = {:filters => {:start_date => Range.new(20080101,20081231}}
sphinx_options = {:query => 'music'}
sphinx_options.merge!(filter_options)
>> Ultrasphinx::Search.new(sphinx_options).run.size
Sorting
Let’s get the search result sorted by start_date in ascending order.
sort_options = :sort_by => 'start_date_sphinx', :sort_mode => 'ascending' sphinx_options.merge!(filter_options) >> Ultrasphinx::Search.new(sphinx_options).run.size
Pagination
Pagination is important and it is built right into ultrasphinx.
sphinx_options.merge!(:per_page => '20', :page => params['page']) #process in controller @search = Ultrasphinx::Search.new(sphinx_options) @search.run @mysphinxs = @search.results # in the view <%= will_paginate @search %>
Watch out for
If you open development.base ( in this case staging.base) file you will notice the port number as 3313. If you decide to go to production then ensure that in production.base you need to change the port number at two places:under searchd and under client section.
Make sure to add a cron job to periodically index your documents. More instructions on setting up cron job is available here