Splitting up Django models

2009 November 26

Sometimes it makes sense to split up your Django models for a specific application across multiple files instead of having all of them in one models.py file. This allows for easier and simpler code organization and maintenance.

Splitting up your models in Django is fairly simple, although it requires a few extra steps. In our example, we’ll create two Django models, Foo and Bar, each defined in their own files within an app called myapp.

Let’s also assume that Bar has a ForeignKey to Foo.

Create a models Python module within the app. The application directory structure may look something like:

  • /myapp
    • /models
      • __init__.py
      • bar.py
      • foo.py

Here are the contents of foo.py:

from django.db import models

class Foo( models.Model ):
    foo_text = models.CharField()

    class Meta:
        app_label = 'myapp'

And bar.py:

from django.db import models
from myapp.models.foo import Foo

class Bar( models.Model ):
    foo = models.ForeignKey( Foo )
    bar_text = models.CharField()

    class Meta:
        app_label = 'myapp'

Notice the definition of the app_label property in the inner Meta classes for each model. This is very important to let Django’s syncdb command know that these split up model classes belong to the same application.

We’re not done yet. You’ll also need to explicitly import each model class in the model module’s __init__.py file:

from myapp.models.foo import Foo
from myapp.models.bar import Bar

And that’s it. Run syncdb and you should be all set.

NOTE: One thing to note about ForeignKey relationships is that the import order in __init__.py is very important. Since Bar has a ForeignKey to Foo, Foo must be imported before Bar.

NOTE: If you are splitting up the contents of an existing models.py file, make sure to delete the original models.py file when you are done otherwise syncdb may get confused.

UPDATE November 28, 2009 @ 3:30PM) Pedro Costa pointed out an existing issue with Django (ticket #6961) and loading fixtures for split up models. One solution is to just put the fixtures under the new /models sub-dir since Django is already looking for them there. However, this may break once they do fix the bug in the Django code base. Or you could try Justin Lillly’s approach which was mentioned on his blog.

UPDATE November 29, 2009 @ 10:04AM) I was asked to illustrate how to use a ForeignKey between these models. I have modified my post to show an example of how to do that.

Setting up your own Hadoop cluster with Cloudera distro for EC2

2009 November 7

Installing Prerequisites

Before you begin, figure out what distro you are on (if you don’t already know) by issuing this command from the shell:

lsb_release -c

For my set up, I used Ubuntu 8.04.3 LTS Hardy Heron.

Add Cloudera repositories for your distro to apt sources. Create a file called /etc/apt/sources.list.d/cloudera.list and add the following two lines in it:

deb http://archive.cloudera.com/debian [distro]-stable contrib
deb-src http://archive.cloudera.com/debian [distro]-stable contrib

Add repository GPG key to your keyring:

curl -s http://archive.cloudera.com/debian/archive.key | sudo apt-key add -

Update apt:

sudo apt-get update

Install Hadoop:

sudo apt-get install hadoop

Install boto and simplejson for Python (I used pip but you could use easy_install as well):

sudo pip install boto simplejson

Setting Up Cloudera Client Scripts

Download the Cloudera client scripts:

wget http://cloudera-packages.s3.amazonaws.com/cloudera-for-hadoop-on-ec2-py-0.3.0-beta.tar.gz

The location of the scripts may change and the only link to the scripts can be found on this page:

http://archive.cloudera.com/docs/_getting_started.html

Once downloaded, extract the archive’s contents to /opt/cloudera. You could put it anywhere else you like.

Modify your environment variables and add the following to your .bash_profile:

export AWS_ACCESS_KEY_ID=[your AWS access key ID]
export AWS_SECRET_ACCESS_KEY=[your AWS secret access key]
export PATH=$PATH:/opt/cloudera

The Cloudera client scripts need to have access to your AWS credentials from the system environment in order to work properly. Adding the path to where the the scripts reside will allow you to call the hadoop-ec2 command that comes bundled with the package.

Source your .bash_profile to update your environment with the changes.

source ~/.bash_profile

Create your .hadoop-ec2 config directory in your home directory:

mkdir .hadoop-ec2

Create a file called ec2-clusters.cfg in your .hadoop-ec2 directory with the following contents:

[my-hadoop-cluster]
ami=ami-6159bf08
instance_type=c1.medium
key_name=[your EC2 key name]
availability_zone=us-east-1b
private_key=/path/to/keypair.pem
ssh_options=-i %(private_key)s -o StrictHostKeyChecking=no

You could have multiple blocks for each of your clusters defined in this file. We just created on block for a cluster named my-hadoop-cluster.

Test that the Cloudera client scripts are working:

hadoop-ec2 list

If everything is configured properly up to this point, you should get output that says:

No running clusters

Launch your cluster (in this case with a master and one slave node):

hadoop-ec2 launch-cluster my-hadoop-cluster 1

Set up a proxy to the cluster:

hadoop-ec2 proxy my-hadoop-cluster

Setting Up Pig

The version of Pig that is a part of the Cloudera apt repositories was a little out of date for me so I decided to install a more updated version manually. The version of Hadoop that runs on the cluster is 0.18.3 and the latest version of Pig that is compatible with that Hadoop version is 0.4.0. You can download it from here:

wget http://apache.cs.utah.edu/hadoop/pig/pig-0.4.0/pig-0.4.0.tar.gz

If the link above doesn’t work, then try any of the mirrors for Pig listed on this page:

http://www.apache.org/dyn/closer.cgi/hadoop/pig

Extract the contents of the archive to /opt/pig. Again, you could put it anywhere you desire.

Add the following to your .bash_profile:

export HADOOP_CONF_DIR=~/.hadoop-ec2/my-hadoop-cluster
export HADOOP_HOME=/usr/lib/hadoop
export JAVA_HOME=/usr/lib/jvm/java-6-sun
export PIG_CLASSPATH=/opt/pig/pig-0.4.0-core.jar:$HADOOP_CONF_DIR
export PIG_HADOOP_VERSION=0.18.3

The HADOOP_CONF_DIR environment variable should point to the location of your cluster’s hadoop-site.xml file. If you are going to be running multiple clusters, you may want to create some shell scripts that set the appropriate environment variables for each cluster before running any commands like Pig, which depend on those values.

Don’t forget to add the path to the Pig bin directory to your path as well:

export PATH=$PATH:/opt/cloudera:/opt/pig/bin

Source your .bash_profile to update your environment with the changes and then run the pig command:

pig

It should launch Pig in mapreduce mode by default. In this mode, the grunt shell will be connected to your EC2 Hadoop cluster vs. your local Hadoop installation. You should see something like the following:

2009-11-07 15:02:15,930 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://ec2-xx-xxx-xx-xx.compute-1.amazonaws.com:8020/
2009-11-07 15:02:16,653 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: ec2-xx-xxx-xx-xx.compute-1.amazonaws.com:8021

Congratulations, you now have a fully functioning Hadoop cluster running on EC2 with Pig ready to do your bidding.

rxvt – A better console for Cygwin

2009 November 2
by Nizam

I have been using xterm on Cygwin/X for a long time now. I love the fact that you can have a native X Windows server running on Windows with xterm and all the X11 goodies. However, since upgrading to Windows 7, I have had a lot of problems with getting both X and xterm to work properly.

I was somehow able to cobble some stuff together to make it all work but then… disaster! I found out that copying text from Windows and pasting it into xterm (specifically when I am connected to a remote host via SSH) kills my network connection for a few seconds! It happens every time and I haven’t found a single solution to this problem on the web after Googling for it for months.

After a lot of hair pulling, I discovered that Cygwin has a port of rxvt. So I installed it and it is working perfectly for me. I created a shortcut on my desktop with this as the target:

C:\Cygwin\bin\rxvt.exe -display :0 -fn "Consolas-20" -tn rxvt -e /usr/bin/bash --login -i

Copy/paste is working again and everything is as it should be.

Designing user interfaces with Balsamiq Mockups For Desktop

2009 October 31

As a web developer, it can be quite time consuming to meticulously code complex pages with HTML and CSS. What’s worse is to invest a lot of time doing that and then realizing that your precious layout just doesn’t work from a UX stand point. Sometimes you have multiple ideas for a layout and are torn between which one to go with.

There are many software tools out there which allow you to create wireframe mockups/sketches of user interfaces. Some of these tools can be quite expensive and for a startup that is bootstrapping, totally out of reach. Well, not to worry. I recently got my hands on a very inexpensive tool called Balsamiq Mockups For Desktop which really increased my productivity as a UX designer (one of the many hats I have to wear currently).

Sample mockup

Mockups For Desktop is based on the Adobe AIR platform. Once you install the AIR runtime on your PC, installing AIR-based applications is a breeze.

Mockups For Desktop is an absolute bargain at $79 a license. The user interface is really intuitive and I was able to dive in and create my first mockup in about an hour. The interface sports a toolbar at the top with all your UI elements. Balsamiq has done a superb job of creating a toolset which is comprised of the most common UI elements you are likely to need in designing your mockups. You can simply drag and drop any of these elements onto the grid-based canvas below.

The elements are very simple to position with intelligent snap-to-grid alignment. Double-clicking an element allows you to edit its text content. You can also modify and fine tune the properties of each element by using the hovering property editor that shows up when an element has focus.

Overall, I am very impressed with Mockups For Desktop. I highly recommend it for anyone looking for a cost-effective way to design and layout user interfaces. It has become an invaluable part of my software tool set.

Disclaimer: I was given a free license to the application for trying it out and reviewing it on my blog.

Go to top of page using jQuery

2009 September 18
by Nizam

This one liner jQuery snippet will force your browser to go to the top of your page:

$( 'html, body' ).animate( { scrollTop: 0 }, 0 );

If you want to add some smooth scrolling:

$( 'html, body' ).animate( { scrollTop: 0 }, 'slow' );

Windows 7, 64-bit Python and easy_install

2009 August 14

I recently downloaded and installed Windows 7 RTM on my laptop. I upgraded from 32-bit XP to a 64-bit flavor of Windows 7. I decided to install a 64-bit version of Python to take advantage of the 6GB of memory installed on my laptop. All well and good.

I proceeded to grab and set up easy_install which installed without any issues. Things started to go awry when I tried to actually install a package using easy_install. I started getting the following error:

Cannot find Python executable C:\Python25\python.exe

It turns out that this has been an issue with setuptools and 64-bit Python for a while:

http://aspn.activestate.com/ASPN/Mail/Message/ActivePython/3343098

There is an active ticket for setuptools as well (with the last update dated August 11, 2009 at the time of writing):

http://bugs.python.org/setuptools/issue2

Hopefully, it will be resolved soon. My solution was to fall back to 32-bit Python in the interim.

Full-text search across multiple Django models using Djapian/Xapian

2009 May 22

I recently implemented full text search in my Django web application. I am a big fan of the Xapian search engine library so I was extremely thrilled to find the Djapian project. Djapian is a Django app that enables indexing and searching of your Django models.

One of the major requirements I had for my search component was to allow searching for terms across multiple models at once. After some digging around the Djapian code, I discovered that it may be possible to do this using the CompositeIndexer class.

Enough talk. On to the steps required to get full text search working across multiple models:

Step 1. Install Xapian Python bindings

For Ubuntu/Debian:

sudo apt-get install python-xapian xapian-tools

For Windows:

Download and install pre-compiled bindings for your installed Python version from: http://www.flax.co.uk/xapian_windows.shtml

Step 2. Install Djapian

There is a small bug with the CompositeIndexer class in the current release version of Djapian. I opened a ticket concerning it here and it has already been resolved and should be in the next stable release. In the meantime, you should download the copy in trunk from:

http://djapian.googlecode.com/svn/trunk/

Add djapian to INSTALLED_APPS in settings.py.

Set options in settings.py:

DJAPIAN_DATABASE_PATH = '/path/to/my/project/data/djapian/'
DJAPIAN_STEMMING_LANG = 'en'

Run syncdb:

python manage.py syncdb

Step 3. Set up indexing for your models

For my quick experiment, I created the following models:

from django.db import models
import djapian

class Article( models.Model ):
    title = models.CharField( max_length = 64 )
    article_body = models.TextField()

class Note( models.Model ):
    title = models.CharField( max_length = 64 )
    note_body = models.TextField()

class ArticleIndexer( djapian.Indexer ):
    fields = [ 'title', 'article_body' ]
    tags = [
        ( 'title', 'title' ),
        ( 'content', 'article_body' )
    ]

class NoteIndexer( djapian.Indexer ):
    fields = [ 'title', 'note_body' ]
    tags = [
        ( 'title', 'title' ),
        ( 'content', 'note_body' )
    ]

djapian.add_index( Article, ArticleIndexer, attach_as = 'indexer' )
djapian.add_index( Note, NoteIndexer, attach_as = 'indexer' )

Notice that I am using the same search tags (title and content) for both indexed models. I use title for titles, headings and short names of an object. I use content for text fields and content bodies.

By ensuring that I use the same two tags, I can provide a consistent search interface regardless of the content types being searched. Here is an example search query that will search both Note and Article objects:

title:Django OR content:Django

Step 4. Index the data

Assuming that you have added a bunch of content to your models, you can run the indexer as follows:

python manage.py index --rebuild

Step 5. The search view

Searching across multiple models is dead simple with Djapian:

indexers = [ Article.indexer, Note.indexer ]
comp = CompositeIndexer( *indexers )
results = comp.search( q ).prefetch()

The trick is to pass in a list of Indexer objects when creating a new CompositeIndexer. Once this is done, then you can use the search method as you normally would.

UPDATE (May 26, 2009 @ 10:07AM) Alex Koshelev provided us with a great tip. He recommends calling prefetch() after the search() call for fetching results. It will prevent hitting the database every time the instance property is accessed later on (i.e. from the template).

Here is the complete search view code:

from django.http import HttpResponse
from django.shortcuts import render_to_response
from django.template.context import RequestContext
from content.models import Article, Note
from djapian.indexer import CompositeIndexer

def search( request ):
    q = request.GET.get( 'q' ) or request.POST.get( 'q' )
    if q is not None:
        # list of indexers for searching across multiple models
        indexers = [ Article.indexer, Note.indexer ]
        comp = CompositeIndexer( *indexers )
        results = comp.search( q ).prefetch()
    else:
        results = None

    template = 'content/search.html'
    data = {
        'results': results,
    }
    return render_to_response( template, data,
                               context_instance = RequestContext( request ) )

Here is the search results page template:

<div>
	<form method="POST" action="{% url search %}">
		<input id="id_q" type="text" name="q"/>
		<input type="submit" value="Go"/>
	</form>
</div>
<br />
{% if results %}
<table>
	{% for entry in results %}
	<tr>
		<td>
		<h1>{{ entry.instance.title }}</h1>
		<p>Match: {{ entry.percent }}% | Rank: {{ entry.rank }}</p>
		</td>
	</tr>
	{% endfor %}
</table>
{% endif %}

If all goes well, we should now be able to search across multiple models! I would like to personally thank the Djapian developers for bridging the gap between Django and Xapian. This opens up some really, really great possibilities.

Some things to think about

In my example, both models have a common title field which I use for displaying entries in the search results. This is not realistic as you may have models with totally different field names and structures. An easy solution would be to implement a common method in each model that renders a search result row’s HTML representation of itself.

def get_search_result_html( self ):
	template = 'note/search_result.html'
	data = { 'object': self }
	return render_to_string( template, data )

And from the search results template, you can display each row as follows:

{% for entry in results %}
<tr>
	<td>
	{{ entry.instance.get_search_result_html }}
	<p>Match: {{ entry.percent }}% | Rank: {{ entry.rank }}</p>
	</td>
</tr>
{% endfor %}

Nice thing about this approach is that it gives you the flexibility to display customized search results depending on the content type.

Customized comment notifications from Django

2009 May 8

I recently had to implement a way to send notifications (using the excellent django-notification app developed by James Tauber) to users whose content is commented on in my Django web app. However, I wanted the owner/creator of the original content to get a more customized notification message. For example, if the model instance being commented on was an idea from an idea sharing app, I would like the notifications to look something like:

John Smith commented on your idea – “Switching to a better version control system”:

That’s a brilliant idea. I totally agree with you. Let’s make this happen.

Looks a lot better than just: “John Smith commented on something.”

Okay, so the first thing we need to do is to define the notice type for comments. Here are the values I used for my notice type:

  • label: comment_posted
  • display: Comment Posted
  • description: someone posted a comment for your content

from django.db.models import signals, get_app
from django.utils.translation import ugettext_noop as _
from django.core.exceptions import ImproperlyConfigured

try:
    notification = get_app( 'notification' )

    def create_notice_types( app, created_models, verbosity, **kwargs ):
        notification.create_notice_type( 'comment_posted',
            _( 'Comment Posted' ),
            _( 'someone posted a comment for your content' ) )

    signals.post_syncdb.connect( create_notice_types, sender = notification )
except ImproperlyConfigured:
    print 'Skipping creation of NoticeTypes as notification app not found'

The next step is to create a signal handler and wire it to the post_save signal for the Comment model.

from django.db.models import signals
from django.contrib.comments.models import Comment
from django.contrib.contenttypes.models import ContentType
from django.utils.translation import ugettext as _
from django.core.exceptions import ImproperlyConfigured
from django.db.models import get_app

try:
    notification = get_app( 'notification' )
except ImproperlyConfigured:
    notification = None

# valid content types
VALID_CTYPES = [ 'link', 'note', 'update' ]

def send_comment_notification( sender, **kwargs ):
    # no point in proceeding if notification is not available
    if not notification:
        return

    if 'created' in kwargs:
        if kwargs[ 'created' ] == True:

            # get comment instance
            instance = kwargs[ 'instance' ]

            # get comment's content object and its ctype
            obj = instance.content_object
            ctype = ContentType.objects.get_for_model( obj )

            # check for valid content types
            if ctype.name not in VALID_CTYPES:
                return

            # for customized notification message
            if ctype.name == 'link':
                type = _( 'your link' )
                descr = obj.link
            elif ctype.name == 'note':
                type = _( 'your note' )
                descr = obj.title
            elif ctype.name == 'update':
                type = _( 'your status update' )
                descr = obj.update

            # send notification to content owner
            if notification:
                data = {
                    'comment': instance.comment,
                    'user': instance.user,
                    'type': type,
                    'descr': descr,
                }

                # notification is sent to the original content object
                # owner/creator
                notification.send( [ obj.user ], 'comment_posted', data  )

# connect signal
signals.post_save.connect( send_comment_notification, sender = Comment )

Nothing fancy or complicated here. Some things to note:

  • I created a list of content type names called VALID_CTYPES which I use to control which content types trigger the creation of a custom notification.
  • Based on the content type’s name, I define some text that will be used for in the notification message (type).
  • I also assign a descriptive name of the item in question (descr). The descriptive name should map to whatever field in the model instance that best describes that instance. You can also use __unicode__() if you wish. You just have to make sure that it is defined in the model class and is suited for display in the notification message.
  • The notification is only sent to the owner/creator of the content being commented on.

Finally, we create a simple template for the notification e-mail and put it in templates/notification/comment_posted/:

{% load i18n %}
{% blocktrans with comment as comment and type as type and descr as descr and user.get_full_name as user %}
{{ user }} commented on {{ type }} - "{{ description }}":

{{ comment }}
{% endblocktrans %}

There you have it. Some food for thought:

  • In the code above, I only send the notification to the owner/creator of the content. However, you could easily add some code to grab a list of all the users who previously commented on the same content object and spam them as well (ala Facebook)!
  • If you have permalinks defined for your models, you could pass the actual content object to the notification template and use get_absolute_url() to display a direct link to the object’s view.

Dynamic Django queries (or why kwargs is your friend)

2009 April 27
by Nizam

A very easy way to dynamically build queries in Django is to use Python kwargs (keyword arguments).

Let’s say we have a model that looks something like this:

class Entry( models.Model ):
    user = models.ForeignKey( User, related_name = 'user_relation' )
    category = models.ForeignKey( Category, related_name = 'category_relation' )
    title = models.CharField( max_length = 64 )
    entry_text = models.TextField()
    deleted_datetime = models.DateTimeField()

Our goal is to dynamically build a query as we go in a view. Using kwargs, we can easily do something like this in our view code:

kwargs = {
    # you can set common filter params here
}

# will return entries which don't have a deleted_datetime
if exclude_deleted:
    kwargs[ 'deleted_datetime__isnull' ] = True

# will return entries in a specific category
if category is not None:
    kwargs[ 'category' ] = category

# will return entries for current user
if current_user_only:
    kwargs[ 'user' ] = request.user

# will return entries where titles match some search query
if title_search_query != '':
    kwargs[ 'title__icontains' ] = title_search_query

# apply all filters and fetch entries that match all criteria
entries = Entry.objects.filter( **kwargs )

Its that simple. This approach seems quite pedestrian when you think about it. However, I didn’t find any examples online which actually shows this in use. It may be useful for someone new to Django.

UPDATE (Apr 30, 2009 @ 9:39AM) Based on the comments I received, I wanted to update this post a little bit. It was mentioned that this approach may not work if you use Q objects for complex lookups. It turns out that QuerySet filter() accepts both args and kwargs. So you could actually do something like:

kwargs = { 'deleted_datetime__isnull': True }
args = ( Q( title__icontains = 'Foo' ) | Q( title__icontains = 'Bar' ) )
entries = Entry.objects.filter( *args, **kwargs )

Very cool indeed. Django never ceases to amaze me.

Database independent Django queries: COALESCE vs. NVL, IFNULL, ISNULL

2009 April 26

Most of the time, it is not necessary to write raw SQL from Django. However, there are cases where it can’t be avoided.

One common pattern in SQL that always comes up is to check two fields and get the value of the first non-NULL field. In Oracle I’ve used NVL and in MySQL I’ve used IFNULL to do this. In the MS-SQL world, the equivalent command is ISNULL. E.g., I could write the following query for Oracle:

SELECT NVL( ratings_score.score, 0 )
FROM ratings_score
WHERE ratings_score.content_type = 1
AND ratings_score.object_id = 5

Or its equivalent for MySQL:

SELECT IFNULL( ratings_score.score, 0 )
FROM ratings_score
WHERE ratings_score.content_type = 1
AND ratings_score.object_id = 5

However, I really don’t feel like writing database dependent SQL queries and have them embedded in my Django code. We’re using MySQL right now but we want to have the flexibility to move to PostgreSQL or Oracle in the future, if needed. So I did some digging around and found out that all the major database platforms support another command which essentially does the same thing as NVL, IFNULL and ISNULL (and more!). The command is called COALESCE and it is supported by:

So using COALESCE seems to be the safest way to check two (or more) fields and return the first non-NULL field from a raw SQL query in Django. I can now sleep easy and have piece of mind that if we move to a different RDBMS in the future, our code will not break. Hopefully.