Jump to page content

Bug of the moment 2006-04-13

Something rather peculiar caught my eye just as I was closing a browser tab, something odd written in a banner advert. Too late, the tab was gone and a different advert (or rather, none at all as it was filtered) appeared after undoing the tab close. Wondering if I could pull it from the cache, I asked Firefox for about:cache and lo, like iCab, it has a cache browser (albeit not a very good one). The image was not in the recorded cache nor in the cache folder on disc. I did find this though, the usage figures for the memory cache:

Number of entries: 1001
Maximum storage size: 21504 KiB
Storage in use: 67957 KiB
Inactive storage: 0 KiB

I presume the “maximum storage size” only counts when you have less pages open than the maximum amount of cache? Then again, Firefox is using 149 MB of memory which is a lot more than is in its memory cache, and I don’t have that much open really.

Back to the image

Anyhow, copious use of the Refresh button rewarded me with the “image” that I was looking for:

Notice the £? What is a “£”? Those of you in Britain or of British persuasion will recognise it as what happens when the pound symbol is stored in Unicode text (it’s outside of 7-bit ASCII) and then displayed as if it were ISO 8859-1 or Windows Latin-1: the Unicode control character shows up as A-circumflex. This, along with some other affected characters, is a common problem with remote RSS feed displays on sites and all those people who thieve everyone else’s blog posts. (Incidentally I realised some time ago now that ISO 8859-1 is a subset of UTF-8 and thus my whole site is now sent as UTF-8 regardless of how the pages were saved.)

So how would it affect an image? Who would be stupid enough to generate an image with an obvious mistake in it, and how would the mistake happen? The answer is no-one, because what you see is not an image.

The “banner adverts” on the page are actually composed of a standard background image with the specific text overlaid. You can demostrate this using Find to select the text of the advert:

So, the text in question is sent separately to the image and that is where the encoding error occurs. The page in question presumably was not UTF-8 and thus the final text was mis-rendered.

In theory, if you blocked just the background image you would have just the text left, but AdBlock does not seem to be able to block background images! D’oh.

Posted 13th April 2006 – Comments and questions?