The amount of bandwidth the NZDWFC site's been using has been steadily increasing recently, so I've been looking at what I can do to reduce the amount of data sent, hopefully without impacting on anyone's browsing experience. The top ten pages in terms of hits last month were:
- the index page
- the new series message board
- the forums index page
- the general message board
- the page of series 2 images
- a piece of Cyberman artwork
- The Traders' Corner message board
- the Andrew Cartmel interview
- the artwork from the cover of TSV 72
- Pr1me Computers
I suspect that main culprit is the fifth item there, since it's basically a page of thumbnails, but aside from that seven of the pages there have something in common - they're dynamically generated. When someone hits the forum index, a script grabs the last ten posts on each message board and constructs an HTML page which is sent to the browser.
When a user visits a static page, which is stored as a .html file on the server, the web server sends a "Last-Modified" header telling their browser when the file was last changed. The next time they visit it, the browser sends an "If-Modified-Since" header to the web server to say "send me the page if it's been updated since X date/time". The web server checks against the .html file and will only send it to the browser if it has been changed. This saves a bit of bandwidth by not sending unnecessary data.
If a web page is generated dynamically by a perl script (or a script in any other programming language, for that matter), the web server has no way of knowing whether the contents of the page have changed since the user last looked at it, so it sends it again. Support for "Last-Modified" and "If-Modified-Since" have to be done in the script itself. So last night I implemented it in the script which generates the forum index page.
The problem with this, as I discovered, was that the forum index also has controls on it to expand and shrink the lists for each message board. These affect the way that the script generates the HTML page, so if the script is only checking for changes to the message boards and not changes to these controls, the controls stop being persistent between visits. I probably would have found this out last night if Xtra's broadband wasn't so crap - at one point it completely dropped my connection for about ten minutes...
So I think the answer is to use an ETag header instead. ETags work in a similar way, but you're not limited to a date/time value, so it can include whatever other settings affect the generated page as well. One question I have which I haven't been able to find an answer for is that the If-None-Match header which a browser sends can contain more than one entity-tag value, so how does the browser know when an entity-tag value is no longer valid? The RFC doesn't make it clear what the client should do. Does that mean eventually browsers could be sending hundreds of entity-tag values?