Full-text search across multiple Django models using Djapian/Xapian

2009 May 22

I recently implemented full text search in my Django web application. I am a big fan of the Xapian search engine library so I was extremely thrilled to find the Djapian project. Djapian is a Django app that enables indexing and searching of your Django models.

One of the major requirements I had for my search component was to allow searching for terms across multiple models at once. After some digging around the Djapian code, I discovered that it may be possible to do this using the CompositeIndexer class.

Enough talk. On to the steps required to get full text search working across multiple models:

Step 1. Install Xapian Python bindings

For Ubuntu/Debian:

sudo apt-get install python-xapian xapian-tools

For Windows:

Download and install pre-compiled bindings for your installed Python version from: http://www.flax.co.uk/xapian_windows.shtml

Step 2. Install Djapian

There is a small bug with the CompositeIndexer class in the current release version of Djapian. I opened a ticket concerning it here and it has already been resolved and should be in the next stable release. In the meantime, you should download the copy in trunk from:

http://djapian.googlecode.com/svn/trunk/

Add djapian to INSTALLED_APPS in settings.py.

Set options in settings.py:

DJAPIAN_DATABASE_PATH = '/path/to/my/project/data/djapian/'
DJAPIAN_STEMMING_LANG = 'en'

Run syncdb:

python manage.py syncdb

Step 3. Set up indexing for your models

For my quick experiment, I created the following models:

from django.db import models
import djapian

class Article( models.Model ):
    title = models.CharField( max_length = 64 )
    article_body = models.TextField()

class Note( models.Model ):
    title = models.CharField( max_length = 64 )
    note_body = models.TextField()

class ArticleIndexer( djapian.Indexer ):
    fields = [ 'title', 'article_body' ]
    tags = [
        ( 'title', 'title' ),
        ( 'content', 'article_body' )
    ]

class NoteIndexer( djapian.Indexer ):
    fields = [ 'title', 'note_body' ]
    tags = [
        ( 'title', 'title' ),
        ( 'content', 'note_body' )
    ]

djapian.add_index( Article, ArticleIndexer, attach_as = 'indexer' )
djapian.add_index( Note, NoteIndexer, attach_as = 'indexer' )

Notice that I am using the same search tags (title and content) for both indexed models. I use title for titles, headings and short names of an object. I use content for text fields and content bodies.

By ensuring that I use the same two tags, I can provide a consistent search interface regardless of the content types being searched. Here is an example search query that will search both Note and Article objects:

title:Django OR content:Django

Step 4. Index the data

Assuming that you have added a bunch of content to your models, you can run the indexer as follows:

python manage.py index --rebuild

Step 5. The search view

Searching across multiple models is dead simple with Djapian:

indexers = [ Article.indexer, Note.indexer ]
comp = CompositeIndexer( *indexers )
results = comp.search( q ).prefetch()

The trick is to pass in a list of Indexer objects when creating a new CompositeIndexer. Once this is done, then you can use the search method as you normally would.

UPDATE (May 26, 2009 @ 10:07AM) Alex Koshelev provided us with a great tip. He recommends calling prefetch() after the search() call for fetching results. It will prevent hitting the database every time the instance property is accessed later on (i.e. from the template).

Here is the complete search view code:

from django.http import HttpResponse
from django.shortcuts import render_to_response
from django.template.context import RequestContext
from content.models import Article, Note
from djapian.indexer import CompositeIndexer

def search( request ):
    q = request.GET.get( 'q' ) or request.POST.get( 'q' )
    if q is not None:
        # list of indexers for searching across multiple models
        indexers = [ Article.indexer, Note.indexer ]
        comp = CompositeIndexer( *indexers )
        results = comp.search( q ).prefetch()
    else:
        results = None

    template = 'content/search.html'
    data = {
        'results': results,
    }
    return render_to_response( template, data,
                               context_instance = RequestContext( request ) )

Here is the search results page template:

<div>
	<form method="POST" action="{% url search %}">
		<input id="id_q" type="text" name="q"/>
		<input type="submit" value="Go"/>
	</form>
</div>
<br />
{% if results %}
<table>
	{% for entry in results %}
	<tr>
		<td>
		<h1>{{ entry.instance.title }}</h1>
		<p>Match: {{ entry.percent }}% | Rank: {{ entry.rank }}</p>
		</td>
	</tr>
	{% endfor %}
</table>
{% endif %}

If all goes well, we should now be able to search across multiple models! I would like to personally thank the Djapian developers for bridging the gap between Django and Xapian. This opens up some really, really great possibilities.

Some things to think about

In my example, both models have a common title field which I use for displaying entries in the search results. This is not realistic as you may have models with totally different field names and structures. An easy solution would be to implement a common method in each model that renders a search result row’s HTML representation of itself.

def get_search_result_html( self ):
	template = 'note/search_result.html'
	data = { 'object': self }
	return render_to_string( template, data )

And from the search results template, you can display each row as follows:

{% for entry in results %}
<tr>
	<td>
	{{ entry.instance.get_search_result_html }}
	<p>Match: {{ entry.percent }}% | Rank: {{ entry.rank }}</p>
	</td>
</tr>
{% endfor %}

Nice thing about this approach is that it gives you the flexibility to display customized search results depending on the content type.

3 Responses leave one →
  1. 2009 May 25

    It is better to add prefetch() method call after search:


    results = comp.search(q).prefetch()

    helps not to query database each time instance attribute accessed.

  2. 2009 June 15
    Salvador Menendez permalink

    I would just like to say thanks for pointing this out. I never knew that Djapian had the ability to merge multiple indexers.

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS