25

Mar

Filed in Django, How-To's |

A while back I posted a quick, and very simplistic guide to setting up django-sphinx within your project. Since that time we’ve gained a lot of use of the platform. It came to my attention today that the django-nyc group was going to do a presentation on how to setup sphinx within your Django project. This, and many, many questions later, I’ve decided to rewrite my guide to setting up Sphinx with your project.

Install Sphinx

So to get started we’re going to need to download and install Sphinx. django-sphinx supports Sphinx 0.97 and newer, so go ahead and grab the latest version from http://sphinxsearch.com/downloads.html. An install guide for Sphinx can be found under Documentation.

Install django-sphinx

Once you’ve gotten past the initial setup of Sphinx, you’re going to need to install the django-sphinx package. This is fairly simple (as it’s just a python package):

svn checkout http://django-sphinx.googlecode.com/svn/trunk/ django-sphinx
cd django-sphinx
sudo python setup.py install

Please note, while you can grab django-sphinx on easy_install, it’s not updated very often.

Configure Your Models

Next up we need to identify any models which we’ll be using django-sphinx on directly, and add a manager to them:

from djangosphinx import SphinxSearch
 
# A sample model from iBegin
class City(models.Model):
    name            = models.CharField(max_length=32)
    aliases         = SeparatedValuesField(blank=True, null=True)
    slug            = models.SlugField(blank=True)
    country         = models.ForeignKey(Country)
    state           = models.ForeignKey(State, blank=True, null=True)
    listings        = models.PositiveIntegerField(editable=False, default=0)
 
    latitude        = models.DecimalField(max_digits=9, decimal_places=6, editable=False, default=0, blank=True)
    longitude       = models.DecimalField(max_digits=9, decimal_places=6, editable=False, default=0, blank=True)
 
    date_added      = CreatedDateTimeField(editable=False)
    date_changed    = ModifiedDateTimeField(editable=False)
 
    class Meta:
        unique_together = (('country', 'state', 'slug'), ('country', 'state', 'name'))
        db_table = 'cities'
 
    search = SphinxSearch(
        index='cities', # defaults to cities either way
        weights={ # individual field weighting, this is optional
            'name': 100,
            'aliases': 90,
        }
    )

Building Indexes

You’ve now installed both Sphinx, and django-sphinx, and need to configure your sources. While we can’t do all of your configuration for you, we can help you with some of it. The fastest way to do this is with the generate_sphinx_config command.

So, let’s try it:

[dcramer@local] ./manage.py generate_sphinx_config cities >> sphinx.conf
[dcramer@local] cat sphinx.conf

[Clipped]

source cities
{
    type                = mysql
    strip_html          = 0
    index_html_attrs    =
    sql_host            = localhost
    sql_user            = root
    sql_pass            = ***********
    sql_db              = ibegin_iplatform
    sql_port            =
    log                 = /var/log/sphinx/searchd.log

    sql_query_pre       =
    sql_query_post      =
    sql_query           = \
        SELECT id, name, aliases, slug, country_id, state_id, listings, latitude, longitude, date_added, date_changed \
        FROM cities
    sql_query_info      = SELECT * FROM `cities` WHERE `id` = $id

    # ForeignKey's
    sql_group_column    = country_id
    sql_group_column    = state_id
    sql_group_column    = listings

    # DateField's and DateTimeField's
    sql_date_column     = date_added
    sql_date_column     = date_changed
}

index cities
{
    source          = cities
    path            = /var/data/cities
    docinfo         = extern
    morphology      = none
    stopwords       =
    min_word_len    = 2
    charset_type    = sbcs
    min_prefix_len  = 0
    min_infix_len   = 0
}

This generates a fairly basic, and standard source and index for any models that have a SphinxSearch manager attached to them in the requested app. Please note, that this is not optimized, and you will most likely want to clean up the configuration to remove anything that you don’t need included in the search. You also are going to need to update your paths for logs and data files.

Whenever you add a new index, you will also need to index it:

indexer cities --config=sphinx.conf

Or if you already had the index and are simply updating it:

indexer cities --rotate --config=sphinx.conf

Using the Data

The rest is simply querying your model, similar to how you’d do it with a normal QuerySet manager:

results = City.search.query('new york')

# Unlike Django QuerySet's, Sphinx will never evaluate
# the results until it's sliced.
print results
>> <SphinxQuerySet instance>

print list(results)
>> [<City: New York Mills>,
 <City: West New York>,
 <City: New York>,
 <City: New York Mills>,
 <City: New York>,
 <City: New York City>,
 <City: New York>,
 <City: New Jersey and New York City>,
 <City: New york>,
 <City: new york>,
 <City: York New Salem>]

# We have a meta attribute on the QuerySet to give additional data
print results._sphinx
>> {'total': 11,
 'total_found': 11,
 'words': [{'docs': 341, 'hits': 342, 'word': 'new'},
           {'docs': 40, 'hits': 40, 'word': 'york'}]}

# As well as on each individual instance
print results[0]._sphinx
>> {'id': u'5246', 'weight': 200, 'attrs': {'state_id': 3, 'country_id': 0}}

# You can also access this via .sphinx if you don't have a
# sphinx attribute on your model already.
print results[0].sphinx
>> {'id': u'5246', 'weight': 200, 'attrs': {'state_id': 3, 'country_id': 0}}

For additional information, please be sure to check out the following resources:

View Comments Responses to "Setting up Django and Sphinx Full-text Search (django-sphinx)"

Subscribe to this topic with RSS or get the Trackback URL
kevin (Mar 25th):

Thanks David for the updated tutorial. The sphinx presentation at django-nyc from Kevin Howerton went over really well. Actually PeterH’s presentation on Solango and his comparison’s to Sphinx made for great conversation. Keep up the great work.

AhmedF (Mar 25th):

It’s too bad I heard about it too late Kevin – I would have loved to have been there.

rates (Mar 26th):

Another great script to use. Thanks a lot.

Pete K (Mar 26th):

Excellent article. I was just about to dig into django/sphinx when a friend linked me over to it.

How did you handle searchd? Did you set up a conf file for searchd to run off of, then use a different on in your django project?

David (Mar 26th):

You typically use the same config file for searchd as you do your indexer. There’s just a bit more configuration than the tools will generate for you.

Oto Brglez (Mar 27th):

We use Sphinx in our projects. It’s just awsome! Scalable, costumizable, ultra fast and has great API. We use it with PHP and there is nothing bad that i can say about Sphinx.

I hope that soon i’ll get a chance to implement Sphinx with Django or vice versa ;)

Oto Brglez

david (Jul 5th):

I love the site idea fixoutlook.org , do you hace the script ? bratz.usa (at} gmail.com

?????? (Dec 11th):

I hope that soon i’ll get a chance to implement Sphinx with Django or vice versa

Oto Brglez

binnyabraham (Jan 18th):

How to search on a foreignkey field???

Leave A Reply

 Username (*required)

 Email Address (*private)

 Website (*optional)

Note: Comments moderation may be active so there is no need to resubmit your comment.
blog comments powered by Disqus