A while back I posted a quick, and very simplistic guide to setting up django-sphinx within your project. Since that time we’ve gained a lot of use of the platform. It came to my attention today that the django-nyc group was going to do a presentation on how to setup sphinx within your Django project. This, and many, many questions later, I’ve decided to rewrite my guide to setting up Sphinx with your project.
Install Sphinx
So to get started we’re going to need to download and install Sphinx. django-sphinx supports Sphinx 0.97 and newer, so go ahead and grab the latest version from http://sphinxsearch.com/downloads.html. An install guide for Sphinx can be found under Documentation.
Install django-sphinx
Once you’ve gotten past the initial setup of Sphinx, you’re going to need to install the django-sphinx package. This is fairly simple (as it’s just a python package):
svn checkout http://django-sphinx.googlecode.com/svn/trunk/ django-sphinx cd django-sphinx sudo python setup.py install
Please note, while you can grab django-sphinx on easy_install, it’s not updated very often.
Configure Your Models
Next up we need to identify any models which we’ll be using django-sphinx on directly, and add a manager to them:
from djangosphinx import SphinxSearch # A sample model from iBegin class City(models.Model): name = models.CharField(max_length=32) aliases = SeparatedValuesField(blank=True, null=True) slug = models.SlugField(blank=True) country = models.ForeignKey(Country) state = models.ForeignKey(State, blank=True, null=True) listings = models.PositiveIntegerField(editable=False, default=0) latitude = models.DecimalField(max_digits=9, decimal_places=6, editable=False, default=0, blank=True) longitude = models.DecimalField(max_digits=9, decimal_places=6, editable=False, default=0, blank=True) date_added = CreatedDateTimeField(editable=False) date_changed = ModifiedDateTimeField(editable=False) class Meta: unique_together = (('country', 'state', 'slug'), ('country', 'state', 'name')) db_table = 'cities' search = SphinxSearch( index='cities', # defaults to cities either way weights={ # individual field weighting, this is optional 'name': 100, 'aliases': 90, } )
Building Indexes
You’ve now installed both Sphinx, and django-sphinx, and need to configure your sources. While we can’t do all of your configuration for you, we can help you with some of it. The fastest way to do this is with the generate_sphinx_config command.
So, let’s try it:
[dcramer@local] ./manage.py generate_sphinx_config cities >> sphinx.conf
[dcramer@local] cat sphinx.conf
[Clipped]
source cities
{
type = mysql
strip_html = 0
index_html_attrs =
sql_host = localhost
sql_user = root
sql_pass = ***********
sql_db = ibegin_iplatform
sql_port =
log = /var/log/sphinx/searchd.log
sql_query_pre =
sql_query_post =
sql_query = \
SELECT id, name, aliases, slug, country_id, state_id, listings, latitude, longitude, date_added, date_changed \
FROM cities
sql_query_info = SELECT * FROM `cities` WHERE `id` = $id
# ForeignKey's
sql_group_column = country_id
sql_group_column = state_id
sql_group_column = listings
# DateField's and DateTimeField's
sql_date_column = date_added
sql_date_column = date_changed
}
index cities
{
source = cities
path = /var/data/cities
docinfo = extern
morphology = none
stopwords =
min_word_len = 2
charset_type = sbcs
min_prefix_len = 0
min_infix_len = 0
}
This generates a fairly basic, and standard source and index for any models that have a SphinxSearch manager attached to them in the requested app. Please note, that this is not optimized, and you will most likely want to clean up the configuration to remove anything that you don’t need included in the search. You also are going to need to update your paths for logs and data files.
Whenever you add a new index, you will also need to index it:
indexer cities --config=sphinx.conf
Or if you already had the index and are simply updating it:
indexer cities --rotate --config=sphinx.conf
Using the Data
The rest is simply querying your model, similar to how you’d do it with a normal QuerySet manager:
results = City.search.query('new york')
# Unlike Django QuerySet's, Sphinx will never evaluate
# the results until it's sliced.
print results
>> <SphinxQuerySet instance>
print list(results)
>> [<City: New York Mills>,
<City: West New York>,
<City: New York>,
<City: New York Mills>,
<City: New York>,
<City: New York City>,
<City: New York>,
<City: New Jersey and New York City>,
<City: New york>,
<City: new york>,
<City: York New Salem>]
# We have a meta attribute on the QuerySet to give additional data
print results._sphinx
>> {'total': 11,
'total_found': 11,
'words': [{'docs': 341, 'hits': 342, 'word': 'new'},
{'docs': 40, 'hits': 40, 'word': 'york'}]}
# As well as on each individual instance
print results[0]._sphinx
>> {'id': u'5246', 'weight': 200, 'attrs': {'state_id': 3, 'country_id': 0}}
# You can also access this via .sphinx if you don't have a
# sphinx attribute on your model already.
print results[0].sphinx
>> {'id': u'5246', 'weight': 200, 'attrs': {'state_id': 3, 'country_id': 0}}
For additional information, please be sure to check out the following resources:
