24

Feb

Filed in Code, Django, How-To's, Python |

Again, I still suck at documentation, and my “tutorials” aren’t in-depth enough. So hopefully this covers all of the questions regarding using the django-sphinx module.

The first thing you’re going to need to do is install the Sphinx search software. You will be able to get this through http://www.sphinxsearch.com/, or probably even port or aptitude.

Configure Sphinx

Once you have successfully installed Sphinx you need to configure it. Follow the directions in on their website for the basic configuration, but most importantly, you need to configure a search index which can relate to one of your models.

Here is an example of an index from Curse’s File model, which let’s you search via name, description, and tags on a file. Please note, that “base” is a base source definition we created which has a few defaults which we use, but this is unrelated to your source definition.

source files_file_en : base
{
sql_query			= \
	SELECT files_file.id, files_file.name, files_data.description, files_file.tags as tag \
	FROM files_file JOIN files_data \
	ON files_file.id = files_data.file_id \
	AND files_data.lang = 'en' \
	AND files_file.visible = 1 \
	GROUP BY files_file.id
sql_query_info		= SELECT * FROM files_file WHERE id=$id
}

Now that you have your source defined, you need to build an index which uses this source. I do recommend placing all of your sphinx information somewhere else, maybe /var/sphinx/data.

index files_file_en
{
	source			= files_file_en
	path			= /var/data/files_file_en
	docinfo			= extern
	morphology			= none
	stopwords			=
	min_word_len		= 2
	charset_type		= sbcs
	min_prefix_len		= 0
	min_infix_len		= 0
}

Configure Django

Now that you’ve configured your search index you need to setup the configuration for Django. The first step to doing this is to install the django-sphinx wrapper. First things first, download the zip archive, or checkout the source from http://code.google.com/p/django-sphinx/.

Once you have your files on the local computer or server, you can simple do sudo python setup.py install to install the library.

After installation you need to edit a few settings in settings.py, which, again, being that I suck at documentation, isn’t posted on the website.

The two settings you need to add are these:

SPHINX_SERVER = 'localhost'
SPHINX_PORT = 3312

Setup Your Model

Now you are fully able to utilize Sphinx within Django. The next step is to actually attach your search index to a model. To do this, you will need to import djangosphinx and then attach the manager to a model. See the example below:

from django.db import models
import djangosphinx
 
class File(models.model):
    name = models.CharField()
    tags = models.CharField() # We actually store tags for efficiency in tag,tag,tag format here
 
    objects = models.Manager()
    search  = djangosphinx.SphinxSearch(index="files_file_en")

The index argument is optional, and there are several other parameters you can pass, but you’ll have to look in the code (or pydoc if I did it right, but probably not).

Once we’ve defined the search manager on our model, we can access it via Model.manager_name and pass it many things like we could with a normal object manager in Django. The typical usage is Model.search.query('my fulltext query') which would then query sphinx, grab a list of IDs, and then do a Model.objects.filter(pk__in=[list of ids]) and return this result set.

Search Methods

There are a few additional methods which you can use on your search queryset besides the default query method. order_by, filter, count, and exclude to name a few. These don’t *quite* work the same as Django’s as they’re used directly within the search wrapper. So here’s a brief rundown of these:

  • query
    This is your basic full-text search query. It works exactly the same as passing your query to the full-text engine. It’s search type will be based on the search mode, which, by default, is SPH_MATCH_EXTENDED.
  • filter/exclude
    The filter and excludes method holds the same idea as the normal queryset methods, except that it is used directly in Sphinx. What this means, is that you can only filter on attribute fields that are present in your search index.
  • order_by
    The order_by method also passes its parameters to Sphinx, with one exception. There are four reserved keywords: @id, @weight, @rank, and @relevance. These are detailed in the Sphinx documentation.
  • select_related
    This method is directly passed onto the Django queryset and holds no value to Sphinx.
  • index_on
    Allows you to specify which index(es) you are querying for. To query for multiple indexes you need to include a “content_type” name in your fields.
  • http://manlynsw.com/ Rich

    Brilliant – thank you!
    I now have Sphinx querying a table and returning ORM models.

    One question – could this work for models that have other models as many-many attributes?

    Oh – wait a minute… it doesn’t matter does it? because your code gets the models ‘proper’ from the ORM based on the ID sphinx returns… is that right?

    thanks very much for the guide!!!!!

  • AndrewSK

    Hi, thanks for an article. Very helpful.
    Where should I put my sphinx.conf file?

  • john

    Is this method any better/worse/different than using the fulltext search in mysql?

  • http://www.paradondevamos.com mamcx

    Hi, any info in this vs. lucene?

    Also, is truly easy to install? I get a lot of problems with pyLucene & xapian.

    Also, work on solaris? i jost on joyent so is a must…

  • David

    I believe Sphinx is MySQL only.

  • http://paxoblog.wordpress.com/2008/03/12/links-for-2008-03-12/ links for 2008-03-12 « PaxoBlog

    [...] In-Depth django-sphinx Tutorial | David Cramer.net Again, I still suck at documentation, and my “tutorials” aren’t in-depth enough. So hopefully this covers all of the questions regarding using the django-sphinx module. [...]

  • http://www.andyskipper.com/random/links-for-2008-03-07/ andyskipper.com – freelance web developer in london

    [...] In-Depth django-sphinx Tutorial [...]

  • http://ryochan7.wordpress.com/ Ryochan7

    Thanks for writing this post. I have been wanting to try Sphinx lately but I didn’t know how to set everything up. This post helped me out a lot.

  • http://www.yeago.net/works/ Yeago

    El Excellente. =)

  • http://www.yeago.net/works/ Yeago

    http://www.sphinxsearch.com/doc.html#quick-tour <– got it running in 15 minutes with this.

    Those with simpler searching needs might check out http://www.mercurytide.co.uk/whitepapers/django-full-text-search/

  • http://www.tmmarket.com matt

    Sphinx works with Mysql and Postgres, just remember to run configure with the –with-pgsql option.

    Seems the search command line tools doesn’t like postgres though..

  • wingedsubmariner

    Won’t “Model.objects.filter(pk__in=[list of ids])” turn pathological with a really huge list of IDs? what if it returns several million, and you only want the first ten…it might be good to add a way to query for one id at a time, though I admit this would require rewriting all the Manager methods to filter individual objects.

  • David

    wingedsubmariner,

    The only gets executed when you slice the Search queryset. So when you do mymodel.search.query(‘q’)[0:10] that list of ids is only 10 long.

  • jacob

    that was asome

  • ricardo

    Hi!

    I will try it latter. Right now, Im trying to understand how does it works and how does it API works. I simply cannot understand how or what do I have to do to show the contents of a query. Right now, I can only print this kind of result:

    [{'docs': 1, 'hits': 1, 'word': 'b'}]

    and ‘b’ should be ‘Bolt’

  • Handy!

    I was playing around with this… is there any way to get the results past the first 20?

    I guess I could just edit the code, but if you have something built in…

  • http://www.eshoptipps.de/ marc

    oh my god, this is just awesome. you saved me a lot of time! great piece of software. thanks david.

    cheers // marc

  • Bartek

    Hi,

    Great django app, just running into one small issue that I can’t quite figure out after reading the documentation

    How can I order my results by a field within my model?

    Assuming ‘category’ is a ForeignKey field in my model, I can’t seem to do:

    Product.search.query(search).order_by(‘category’)

    Any tips on how to do this? Thanks David.

  • force

    HOW “BuildExcerpts”

    I wanto to highligting the resultes

  • Joe

    I’m seeing some weird results that hopefully someone else has seen as well.

    Everything appears to be OK, except I can’t get my hands on the actual search results.

    Running “search -i my_index django” will return the results I expect. In my case, 82 results.

    If I do a “results = MyModel.search.query(“django”)” and then type “results.count()” I get the number 82 that I expected. If I type list(results), however, I get an empty list. []

    Any ideas?

    Thanks.

    Joe

  • http://tagz.in/posts/16s/comments/ Tagz | "In-Depth django-sphinx Tutorial | David Cramer.net" | Comments

    [...] [upmod] [downmod] In-Depth django-sphinx Tutorial | David Cramer.net (www.davidcramer.net) 0 points posted 10 months, 1 week ago by jeethu tags development full [...]

  • Daniel

    Hi, Does anyone know if django-sphinx support faceted search (drill-down filters)? I know that sphinx itself does not provide it out-of-the-box, but there is a Rails plugin (Thinking Sphinx) that implements an easy way to do that… I’m developing my website with django, and I need some help to implement this feature. Can someone help me?

  • http://ofirpicazo.com Ofir

    If I add a new record to my MySQL table using the admin or any other way, how can I add it to the Sphinx index?, so it appears when I run a guery()?. Is there a way to do this automatically, so I don’t have to manually run indexer?

  • http://www.darkcoding.net Graham King

    @Handy

    To get more than 20 results, use a slice:

    MyModel.search.query(query)[0:1000]

  • vicky82_davim

    Recently i have implemented django-sphinx search on my website.
    It is working fine of each separate model.
    But now my client requirement has changed. He wants to display result as
    “Title” first from matching query then description and soon.
    But i think sphinx give result specific to each model but not gives combine
    result of all models.

    So can anyone help me how to display “Title” from matching query as first
    and then description and soon from results of all models.

    Please help me……………

  • binnyabraham

    is filter is possible in character fileds in django-sphinx

  • binnyabraham

    results = Holiday.search.query(“swimming”).filter(name=”Oberoi”)

    Is this filtering is possible in django-sphinx ? If so what i want to set in sphinx.conf file? Now i am getting a value error as
    ValueError: invalid literal for int() with base 10: ‘Oberoi’.
    name is a character field in mysql.

    So anyone please reply me asap…………
    I search for this but i didn’t get any proper answer..
    Please help me……..

  • http://www.anubavam.com/django-developer Django developer

    Excellent post on django modules, very useful for us to develop the software.

  • http://hiDavid zjm1126

    i am a chinese boy,and my english is not very good ,i can’y understand this ‘The typical usage is Model.search.query(‘my fulltext query’) which would then query sphinx, grab a list of IDs, and then do a Model.objects.filter(pk__in=[list of ids]) and return this result set.’

    and my view:
    class File(models.Model):
    name = models.CharField(max_length=200)
    tags = models.CharField(max_length=200) # We actually store tags for efficiency in tag,tag,tag format here

    objects = models.Manager()
    search = djangosphinx.SphinxSearch(index=”test1″)
    def xx(request):
    queryset =File.search.query(‘test’)
    return HttpResponse(queryset)

    and it take a error:
    SearchError: connection to localhost;3312 failed ((10061, ‘Connection refused’))

    why?

    and can you send a example to me?

  • http://komunitasweb.com/2010/02/django-101/ Django 101 | KomunitasWeb

    [...] In-Depth django-sphinx Tutorial (February 2009) [...]

  • mlissner

    Handy, I'm also having this problem. I'd love to hear if you sorted it out.

  • mlissner

    Handy, I'm also having this problem. I'd love to hear if you sorted it out.

  • Janina

    Hiya, just a quick comment – django-sphinx seems to run by default on port 9312, not 3312, so it would be helpful to update the docs here to say SPHINX_PORT = 3312. Thanks!

  • http://www.davidcramer.net David Cramer

    The default port is 3312, which is what django-sphinx defaults to.

blog comments powered by Disqus