1

Dec

Filed in Code, Django, Python |

One of the optimizations (if you want to call it that) that can be done for decreased load-time on a web page, is removing excess white space. In many of our pages at iBegin this saved as much as 100kb. While not every site has some of the 300kb pages that we have, it can still add up very quickly.

Django by default provides a {% spaceless %} tag in your templates which will allow you to achieve this effect, but we don’t use Django’s template engine (Hooray, Jinja!). The tag approach also seemed fairly inappropriate, as we just want to do it across the entire site, no matter what. Instead, we moved it into a middleware, which simply strips all whitespace (using the same method as the built-in tag) from any HTML page.

While this makes some pages unreadable, the time it saves downloading the page for the average user (See Steve Souders’ tips) is well worth the trouble it causes a few people.

# Aliasing it for the sake of page size.
from django.utils.html import strip_spaces_between_tags as short
 
class SpacelessMiddleware(object):
    def process_response(self, request, response):
        if 'text/html' in response['Content-Type']:
            response.content = short(response.content)
        return response

Note, that if you use page caching middleware, placing this in the appropriate place will also allow you to save space in your cache.

Enjoy!

View Comments Responses to "Spaceless HTML in Django"

Subscribe to this topic with RSS or get the Trackback URL
miracle2k (Dec 1st):

I've considering this before, but was always unsure about the CPU cost involved.

Yeago (Dec 1st):

Pretty hot yet simple. Thanks Mistar Cramar.

Sam McDonald (Dec 1st):

I like it a lot. I have been wondering if there was an easy way to do this, and there is. I will definitely have use for this soon.

Honza (Dec 1st):

Is that really worth it?

taking a file:
index.html – 91565 bytes

code:
>>> f = open('index.short', 'w')
>>> f.write( short( file( 'index.html' ).read() ).encode('utf-8') )
>>> f.close()

produces:
index.short – 69915

and my favorite – gzip:
index.gz – 14709
index.short.gz – 13319

examples from zena.centrum.cz where we have some nasty whitespace in our HTML…

Knowing this, I would take mod_deflate in (insert your favorite web server) over any python space stripping anytime…

Dan (Dec 1st):

I've considered this before, but wondered if there was much benefit over gzip (which I would recommend using even with whitespace stripping). Does a whitespace-removed, gzipped response save enough space to justify the (admittedly reasonable) CPU cost of whitespace-removal?

wheaties (Dec 1st):

Very interesting thoughts. I'll have to play around with it sometime. Thanks!

David Cramer (Dec 1st):

Does mod_deflate allow removing the extra whitespace? In your example you are correct, that it's not a huge savings for that individual request, and gzip I would highly recommend, but the CPU time is negligible.

Tom (Dec 2nd):

So… every third byte was whitespace? That's a bit hard to believe, even as a worst-case. Still, if you saw improvements, then my congratulations.

I would suggest, though (as I see others have), that people facing the same problem have a look at simply gzipping their text-based output at the webserver level. Since the action's probably happening on the webhead either way, it's unlikely to be much more costly in CPU terms, and it's even simpler to implement than your solution (it typically just takes a line or two in .htaccess if you're using Apache). Gzipping will certainly remove any overhead that whitespace represents.

David Cramer (Dec 2nd):

It's mostly indentation that actually contributed to the whitespace :)

Andreas (Dec 2nd):

Clever trick! Debuggers could always use firebug if they want it indented nice. Everyone should do this + gzip + expires header if plausible

Mikkel Høgh (Dec 2nd):

Note that Content-Type will usually be 'text/html; charset=utf-8', not just 'text/html' (since Content-Type without a charset is unsafe).

Martin (Dec 2nd):

The drawback of this is, that it also steals the whitespace in pre and textarea tags.

David Cramer (Dec 2nd):

Have you confirmed this? I believe the whitespace tag is designed to avoid that, but I haven't fully tested it.

Martin (Dec 2nd):

Ups, I wasn't aware that there is a real “strip_spaces_between_tags” function. Sorry for the noise :D

Martin (Dec 2nd):

I had problems with highlighted (pygments) text in pre-text which results in unindented code *cry*. But that's a very special condition.

Roger (Dec 2nd):

The appropriate place for cached content to be effected would be at the start of the end? :-/

Roger (Dec 2nd):

or the end*

Peter Bengtsson (Dec 2nd):

I wrote this snippet:
http://www.djangosnippets.org/snippets/1055/
to (very) efficiently whitespace optimize inline CSS which might be interesting for you and your Jinja too.

truebosko (Dec 2nd):

Very nice and simple trick. Thanks

Florent V. (Dec 4th):

Hello,

I'm curious: why outputting whitespace-less HTML when you could just use gzip compression on the server for HTML, XML, CSS and JavaScript, and get better results (like from 300 kB — not kb ;) — to, say, 120 kB)? Did you do some performance testing regarding that?

Unless gzipping your text output has a clear performance cost, I would keep my HTML code with whitespace (if not perfect, then at least decently readable code). Makes it easier to debug when you need to check the real thing (Firebug only shows what Firefox understands, not what it gets from the server).

Do you happen to use this method, then gzip (through mod_deflate for instance)? If so, how much do you save compared to a gzipped version with all whitespace intact?

Martin (Dec 5th):

solving a non-issue with a sub-optimal technique and posting a blag about it… this feels like the type of publicity the rails community is so famous for…

David Cramer (Dec 5th):

Uneducated trolls.. sound like something that every community is famous for.

David Cramer (Dec 5th):

We GZIP as well. Sadly, everyone's still not on broadband today, so even shaving off 5 or 10k from the request can be quite useful (especially when the amount of time it takes to do that is immeasurable).

Chris Kelly (Dec 9th):

The whitespace tag is really simple, it just finds the end of one tag and the beginning of another with whitespace in between, and removes said whitespace. It looks like it doesn't have any special cases, so it unfortunately affects textarea and pre tag content.

see: http://code.djangoproject.com/browser/django/tr...

Chris Kelly (Dec 9th):

The whitespace tag is really simple, it just finds the end of one tag and the beginning of another with whitespace in between, and removes said whitespace. It looks like it doesn't have any special cases, so it unfortunately affects textarea and pre tag content.

see: http://code.djangoproject.com/browser/django/tr...

Dougal Matthews (Dec 20th):

Shame it messes up the

 and  tags :(  was gonna use it otherwise.

Intersting idea though!
Dougal Matthews (Dec 20th):

Shame it messes up the pre and textarea tags :( was gonna use it otherwise.

Intersting idea though!

Martyn Clement (Jan 8th):

I suggest this tag should be effective only on setting debug=False
Debugging html on viewing html source code is quite common.
That’s what I did ;)

lericson (Feb 23rd):

There are bigger fish to fry.

John1297 (May 20th):

Very nice site! cheap cialis http://opeaixy.com/qsqaxa/4.html

Uninstall Program (Jun 21st):

Thanks for your code!

Leave A Reply

 Username (*required)

 Email Address (*private)

 Website (*optional)

Note: Comments moderation may be active so there is no need to resubmit your comment.
blog comments powered by Disqus