27 Jul

pre-parsing HTML for incorrectly-sized images

Every now and then, I get a call from a client who is puzzled why their site is running slow. I would look at their page and see an innocuous image inserted into a paragraph. When I examine the image, though, I see that the client has artificially resized the image using HTML.

One recent example showed on-screen as a 300px-wide image. When I examined it, it was actually 3000px wide (approx). As explained to the client, this had the effect of forcing the browser to use about 100 times more RAM (not counting the overhead of the transformation to 300px-wide), and the download was slower as well.

One solution to all this is to teach all clients how to resize images before they upload them. I did that in this case. But it’s not the easiest solution, and people forget how to do things.

Another solution was proposed by Ken, and that is to parse any submitted HTML for images and check that the size they claim to be is actually correct. he said that he’d had the idea ages ago but never implemented it. I think its time has come, so let’s do it.

There are four ways that images can get resized. through HTML parameters, inline CSS, selector-based CSS and JavaScript. We will address the first two, as the others would be too complex to solve in a small application.

How this will work is that resized images, if detected, will be adjusted in the HTML so their ‘src’ parameter points to a pre-created resized version of the image. The entire script is run when the HTML is submitted into a CMS, before the HTML is placed in the database or published to a file.

First, we need to detect image sources and their assigned sizes.

Here is some sample HTML with images from this site.

<p><img src="http://verens.com/wp-content/themes/mandigo-14/images/green/head.jpg" width="76" height="24" /></p>
<p><img src="/wp-content/themes/mandigo-14/images/green/head.jpg" style="width:76px;height:24px" /></p>

What we want is a function which, when fed that HTML, returns HTML which is modified such that images with incorrect widths and heights have their srcs modified to point to a pre-resized version, which is created using ImageMagick.

Here it is:

function html_fixImageResizes($src){
	// checks for image resizes done with HTML parameters or inline CSS
	//   and redirects those images to pre-resized versions held elsewhere

	preg_match_all('/<img [^>]*>/im',$src,$matches);
	if(!count($matches))return $src;
	foreach($matches[0] as $match){
		if(preg_match('/width="[0-9]*"/i',$match) && preg_match('/height="[0-9]*"/i',$match)){
		else if(preg_match('/style="[^"]*width: *[0-9]*px/i',$match) && preg_match('/style="[^"]*height: *[0-9]*px/i',$match)){
			$width=preg_replace('/.*style="[^"]*width: *([0-9]*)px.*/i','\1',$match);
			$height=preg_replace('/.*style="[^"]*height: *([0-9]*)px.*/i','\1',$match);
		if(!$width || !$height)continue;

		// get absolute address of img (naive, but will work for most cases)

		if(!$x || !$y || ($x==$width && $y==$height))continue;

		// create address of resized image and update HTML

		// create cached image
		$str='convert "'.addslashes($imgsrc).'" -geometry '.$width.'x'.$height.' "'.$imgfile.'"';

	return $src;

The return string from calling that function with the above HTML is this:

<p><img src="/demos/html_imageresizer/f/6bf7dd2b8232448e85d7fa9cd1009b44/76x24.png" width="76" height="24" /></p>

<p><img src="/demos/html_imageresizer/f/6bf7dd2b8232448e85d7fa9cd1009b44/76x24.png" style="width:76px;height:24px" /></p>

Here is an example of it running, and here is the source of that demo.

09 Apr

javascript cache problem, solved

Almost every time someone says to me that something I was working on is broken, the answer is “clear your cache”, and that automagically fixes everything. However, that’s not an ideal solution – ideally, the problem would never happen in the first place.

So, why does the problem happen?

Let’s say that there is a HTML file which calls a JS function like this: showImage(); – the JS function is included from the external file /j/images.js.

Browsers are usually set to cache .js files, and that’s the correct behaviour for the most part. Unfortunately, when a file needs to be fixed, it can cause problems.

For example, let’s say that I’ve corrected the function name to match my usual naming scheme – images_show();. I change the reference in both places. The browser reads the new HTML file from the net, but loads the JavaScript from the cache – suddenly there’s a mismatch which causes a problem.

So, how to get around this?

The solution I’m using at the moment involves a little bit of mod_rewrite and PHP.

Sticking with the contrived example, let’s rewrite /j/images.js so it is accessible from /j/images (using /.htaccess):

ExpiresActive On
ExpiresDefault A259200
RewriteEngine on
RewriteRule ^j/images$ /j/images.js [L]

Now, we add a little magic. We want to change the URL if the file has changed. The only way to know this is to look at the modified date of the file.

In your PHP, you could do it like this:

<script type="text/javascript" src="/j/images/<php? echo md5(`ls -l j/images.js`); ?>"></script>

and then change the .htaccess file to allow that:

ExpiresActive On
ExpiresDefault A259200
RewriteEngine on
RewriteRule ^j/images/(.*)$ /j/images.js [L]

Now, if no file changes happen, then the MD5 hash (and therefore the URL) will be cacheable, and if the file changes, then the URL will automatically change as well.

…and that’s not all!

I like to aggregate my JavaScript files to reduce the network pain felt by the browser. In my CMS, it’s done with a /j/js.php file. Here’s a short excerpt:

/* more files */

header('Cache-Control: max-age=2592000');
header('Expires-Active: On');
header('Expires: Fri, 1 Jan 2500 01:01:01 GMT');
header('Content-type: text/javascript; charset=utf-8');

echo $js;

That’s then pointed to with this line in my .htaccess:

RewriteRule ^js/(.*)$ /j/js.php [L]

And it’s referenced in the browser like this:

echo '<script type="text/javascript" src="/js/'.md5(`ls -l j`).'"></script>';

Simple, innit! That simple trick now keeps track of a number of files, and the browser knows immediately if there are any changes.

BTW: The same trick can be used with images, css, and any number of other “static” objects.

03 Apr

Translation in PHP

As some of you know, my KFM project is available in 17 different languages. I did that using a home-brewed translation method.

Recently, I’ve been working on translating our CMS, WebME. I took a more “official” approach this time, and looked through the PHP documentation.

This article is not about localisation, in that I don’t care at the moment about the difference between en_GB and en_US (other than to point out that Americans spell everything wrong). It’s about translation itself.

The “official” way to do translations is to use the compiled-in gettext support. An un-official, but very popular alternative is to use the PHP-gettext project, which is used by WordPress.

I chose to use the compiled-in gettext support. As I control the server that our CMS is run on, there is no support problem, so I can guarantee that required software will be there.

Another reason for using the compiled-in version vs the PHP library is that, in the words of PHP-gettext’s developer, “I’m not very fond of PHP the language, there’ll be a lot to fix”. In other words, there is a chance that the PHP-gettext library may blow up, and the developer may just shrug his shoulders.

So, how does localisation work? To start off, you need at least one language other than the main language of your site (presumably it’s English). For testing purposes, let’s assume it’s Irish

Translation is managed in “domains” (think “namespaces”). That’s not extremely important if you are only translating a few hundred strings, as you can just use one called “default”.

Language strings are recorded in .mo files. You don’t need a file for your main language.

The files are saved in a directory on your server, using this structure:


The “locales” directory can be named anything you want, and the “default” bits are named after your domain (namespace).

The ga/de directories here should properly be ga_IE and de_DE, but we’re not interested in the locale part – we’re only interested in the language part.

To create your .mo files, first create a file in your server called test.php:

header('Content-type: text/html; Charset=utf-8');
binddomain('default','/path/to/locales'); // change this to your locales directory

echo _('Pages');

Note that I’ve used ‘ga_IE.utf8’ here instead of ‘ga’. I’ll explain that later – it’s important for now.

When you run that, it should output “Pages”. The next step is to write the translation for it.

To create a .mo file, I recommend poedit – it’s cross-platform and works well. I won’t get into too much detail – here is an excellent tutorial – just replace ‘__’ with ‘_’. btw, the Irish for “Pages” is “Leathanaí”

When the file is created, save it as “/path/to/locales/ga/LC_MESSAGES/default.mo”, and restart your webserver (Apache caches gettext strings, so when you change them, you may need to restart the webserver to clear the cache – YMMV).

Now, when you try your script, it should output “Leathanaí”. Simple, innit?

Actually, no. You see, that’s a very contrived version – we deliberately chose a working locale. However, you will want to grab the locale from the browser’s Accept header, and that is not guaranteed to work.

Try it yourself – replace ‘ga_IE.utf8’ with ‘ga’ – the output is now “Pages”. btw, that’s why you don’t need a translation for your main language – gettext will output it’s input if there is no existing translation.

So, how do we get a working locale from the browser?

First, you need a list of your existing languages:

  if ($handle = opendir('/path/to/locales')) {
      $files = array();
      while(false!==($file = readdir($handle)))if (is_dir('/path/to/locales/'.$file))$files[] = $file;
      $available_languages = array();
      foreach($files as $f)$available_languages[] = $f;
  } else {
      echo 'error: missing language files';

Next, parse the browser’s Accept header for a locale which matches what we have.

  $langs = explode(',', $_SERVER['HTTP_ACCEPT_LANGUAGE']);
  foreach($langs as $lang)if (in_array(preg_replace('/;.*/','',trim($lang)), $available_languages)) {
    $selected_language= preg_replace('/;.*/','',trim($lang));

Note that we default to ‘en’.

And now, we properly set up the locale.

    preg_match_all("/[^|\w]".$selected_language.'.*/',`locale -a`,$matches);
    if(!count($matches[0]))die('no locale info for "'.$selected_language.'"');
    foreach($matches[0] as $m)if(preg_match('/utf8/',$m)){

What’s happening here is that we are scanning the webserver’s list of compiled locales (not your list), as gettext will not work unless it has a properly defined locale which is already compiled on the server. The above code will, when given ‘ga’, find ‘ga_IE.utf8’ in the server’s locales list and use that.

After that, it’s just the normal domain bind, as described in the simple example.

header('Content-type: text/html; Charset=utf-8');
binddomain('default','/path/to/locales'); // change this to your locales directory

echo _('Pages');

So now you can translate in just about any language, whether the browser supplies a proper locale string or not.

But wait – there’s more! What if you want to translate the string “Welcome to the ‘$1’ page”, where $1 is a variable?

Unfortunately, PHP’s built-in gettext function can’t do that. But we can hack it together easily. Add this function definition to your script:

function __($string)
    $str = gettext($string);
    for($i = func_num_args()-1 ; $i ; --$i){
    return $str;

Then replace your _(‘Page’) with __(“Welcome to the ‘$1’ page”,”Kaetastic”). You will need to change the “keywords” section of poedit to use ‘__’ as well as ‘_’, then rescan the PHP files, update your .mo and restart the webserver.

Brilliant! Now you have a proper multi-lingual site. I expect throngs of readers to browse your site in Klingon and Leet.

Note that for proper optimisation you should use __() instead of _() only when there are multiple parameters – otherwise it’s quicker just to use _().

06 Mar

vm's and bridges and proxies

oh my!

We bought a rack server for the office, to help replace our aging systems with something a little more civilised. I spent a lot of time this week trying to figure out how to configure it best.

I wanted to install services and servers on the machine in such a way that I could easily move them onto a new machine if things get too busy. For this, I chose to use the QEMU virtual machine emulator. Some people might think that VMWare would be a better choice, but I did some research on it and couldn’t find any compelling reason why I should choose VMWare over QEMU.

To have the system networked properly in the LAN, I wanted to be able to address each vm using a separate IP number. To do this, I had to set up QEMU to use eth0 as a bridge. So, I had this in the host’s /etc/rc.local.

echo 1024 > /proc/sys/dev/rtc/max-user-freq
modprobe kqemu
modprobe tun
/etc/init/iptables down

/sbin/ifdown eth0
/sbin/ifconfig eth0 up
/usr/sbin/brctl addbr br0
/usr/sbin/brctl addif br0 eth0
/usr/sbin/brctl stp br0 off
/sbin/ifconfig br0 netmask up
/sbin/route add default gw

and this was in /etc/qemu-ifup

/sbin/ifdown eth0
/sbin/ifconfig eth0 up
/sbin/ifconfig $1 promisc up
/usr/sbin/brctl addif br0 $1
/sbin/route del default
/sbin/route add default gw

Note that I’ve used $1 instead of tap0 (which is shown in some examples) – this is because when you start up your QEMU instances, each one should use a different tap device.

When loading the QEMU instance, be sure to give each one a different MAC address. Otherwise strange stuff will happen.

xhost +local:root
su -c "qemu -boot c -hda vmServices.img -localtime -net nic,macaddr=52:54:00:00:00:01 -net tap -m 192 -usb -soundhw sb16 &"

In the above case, I’m loading a QEMU instance saved as “vmServices.img”, and have given it a MAC address 52:54:00:00:00:01. The default address is 52:54:00:12:34:56. Be sure to override that.

You’ll have noticed that I turned off iptables in the host’s /etc/rc.local. I’m not an expert at that stuff so that was the simplest solution to enable networking without problems. Be sure to also do it in the client’s /etc/rc.local files as well – otherwise you may have problems accessing hosted web servers, for example.

When the client is loaded up, assign a static IP address to it. I choose static IPs for these servers because they’re not client machines, and I need to be able to consistently access the right one from an external request.

Now you have your network up and running properly, with separate IP addresses for each vm.

The next step is to route incoming web traffic to the right machines.

Let’s say that you want a worker outside the office to access dotproject.youroffice.com, and you want a client to see his test server using blah.com.test.youroffice.com. The problem is that you are using a standard DSL connection, only have one static IP, and the dotproject and test web servers are held in separate VMs on the machine.

In this case, the solution is to use mod_proxy to route to the right machine.

So, you set up a rudimentary virtual hosted webserver on the host machine. The first virtual host should be something generic which perhaps just reports the status of the host. After that, we add the magic:

<VirtualHost *>
  ServerName dotproject.youroffice.com
  ProxyPreserveHost On
  ProxyPass /
  ProxyPassReverse /
<VirtualHost *>
  ServerName test.youroffice.com
  ServerAlias *.test.youroffice.com
  ProxyPreserveHost On
  ProxyPass /
  ProxyPassReverse /

From an external browser’s perspective, both web servers are running on the same machine, but internally, we can see that there are three involved – a proxy router, and the two separate virtual machines’ web servers.

There may be more-correct ways of doing the above, but this works for me.

04 Sep

Daniel in Computer Arts

Daniel - Computer Arts

This is an “artist exposure” page from this month’s Computer Arts (issue 140). Daniel Shiels is one of our designers, who we are training up in various web techs before we decide to fire him 😉

Daniel was thrilled to find he had been profiled in the magazine. We’re thrilled for him.

Speaking of web designers, we’re hiring! If you are Monaghanese and looking for a job in web design, and have experience in CSS and HTML, and perhaps even creating templates for CMSes, then come talk to us! You’ll get to see my face all day!

03 Mar

using KFM's functions from within your CMS

I am currently working on a few property websites. One thing common about most property websites is that they include multiple images showing various aspects of a property. When it came to writing that part of the application, I chose to use KFM‘s file management skills, combined with a little AJAX magic to make the work easy for the client.

To see a demo of what I’m on about, click here, log in as “propertydemo” with password “propertydemo”, and click to create a new property, or edit an existing one.

The important thing to note there is the attaching of images – to attach a new image, you click Browse, choose an image from your machine, then click Upload. The page will not be reloaded – your new image will just magically appear. To delete an image, hover your mouse over the icon, then click ‘x’. The idea for this is in part based on how WordPress manages images, but is of course better, as I wrote it 😉

How it works is that there is a hidden iframe acting as the target for the image upload form. When you submit your image, the image is submitted into the iframe, which is attached directly to the upload.php of the CMS’s KFM installation. We supply an “onload” function so that the upload.php then refreshes the parent page’s list of images.

Simple really!

The hard part was in adapting KFM so that I could use its functions from within WebME. I won’t explain all the work that went into it, but just that it’s all done, and a recent copy of KFM has all the necessary code.

To attach KFM to your CMS, you just need to include() KFM’s configuration.php and api/api.php, making sure that the configuration.php has a correct $kfm_base_path:


Once that’s done, you have access to all of KFM’s functions, as well as the extra API functions which are not used by KFM itself, but are useful for CMS’s.

If there are any questions about how to use any parts of this, please ask them below – I still haven’t gotten around to writing documentation for KFM, but hopefully I’ll be able to get it done based on questions from the great unwashed (ie: you!).

23 Jan

this year so far

I’ve started teaching guitar. I think I have a pretty good method. My student is off work for about six weeks due to some back surgery, so I’ve made the assumption that he will have time to learn some theory, so I started off by explaining “power chords” (or “fifths” – made from the root note and the fifth note), and explaining how major and minor chords are made from the respective 1st, 3rd and 5th notes of the major scale (ionian mode) and minor scale (aeolian mode) from whichever root you want.

Instead of showing twenty open-string versions of chords and expecting to have to remind him of them all again the next week, I demonstrated only four chord shapes, A Am E and Em, and showed how barre chords can be used with those shapes to make any major or minor chord at all.

Hopefully, he has not run screaming for the hills already… I at least only show him songs he actually names himself. Last week, I showed him The Jam’s That’s Entertainment, and this week, I’ll be showing him Pink Floyd’s Wish You Were Here (which will be a handy introduction to open string chords G C and D, as well as hammer-ons and pull-offs).

On the work front, I’m currently working on some pretty significant improvements to our WebME engine‘s AJAX shopping cart. In the admin area, I’m working on allowing products to be created and dragged around categories as if they were files in a file manager. To manage this, I’ve ported my KFM project into WebME and converted the appropriate areas.

Robots… my mini-itx bot has served its purpose – I built it to see if I could, and I could, so it’s time to strip it down and reuse its parts for something else (files server, maybe). I’ll be building a few robots this year, using the Gumstix platform for the brain. This should allow for a much better battery life, as well as a much smaller body size.

We (Bronwyn and I) will be moving house this year. When we moved into our present house, we told the landlords that it would only be for a year or two while we looked for a permanent house to buy and live in. Bronwyn’s parents have given us an enormous boost in this – they found a two storey building with a converted attic, which is about the same size as our current cottage stacked on top of itself three times, and bought it for us. Of course, we’ll still need to pay off the mortgage, etc, but the gesture is definitely not unnoticed! Thanks, guys! More on that later, when details are more concrete.

My son is still a genius. Yesterday, at age 3.3, he wrote his name without prompting. He forgot the “T” in “JARETH”, but still – can your 3.3 year old son write his name? Of course, still not a word out of him yet, but he’s making progress in that as well – every night, he mumbles himself to sleep, trying out various sounds.

update: got a phone call earlier on to say that Bronwyn passed him while he was playing in the house, and she noticed he had written his name in total, not forgetting the “T”.

Boann is calming down a little – instead of screaming all day and night, she now only screams for a while, and I only spend an hour or so every night (usually some time between 1am and 3am) walking up and down wondering why she’s so wide awake at that unholy hour.

Life is getting slowly better.

I have a doctor’s appointment on Friday – I have a lump in a certain area since December, which may or may not be cancerous. We’ll see.

I emailed an old friend, who we had had a falling out with about seven years ago, over some stupid event that may or may not have happened when I was blind drunk after two bottles of vodka. He’s the piercer for a shop in Canada, and is apparently doing quite well for himself! Hi Andy! Bygones have become bygones, and hopefully, we’ll be able to sit down over a few jars at some point and laugh at ourselves.

Anyway… gotta get to work. It’s just past 9am, and I am officially into work-time now, so I’d better stop nattering.

22 Dec

KFM 0.7

demo, download (828k .tbz2, 1.1M .zip)

New Features

  • New Languages
    • Bulgarian, thanks to Tondy (tondy.com)
  • Unzip zipped files (84). This allows users to zip up multiple files offline, upload them as one file, and unzip once they are uploaded.
  • Multiple Databases (127, 122). We now support PostGreSQL, MySQL and SQLite.


  • Files may be located anywhere on the system at all. They do not need to be within a web-readable area (33)
  • bugfixes (117, 100)
  • Long directory names are now truncated, using the same method as long filenames (80)
  • Directories with many files are now displayed quicker (106)
  • Download From Url has been combined with File Upload (108)
  • KFM has been tested and is known to work on PHP4.3+ and PHP5.1+

As usual, this release has been helped along by the many testers in the forum, testers who have contacted me by email, and all of the translators.

Development for version 0.7 was sponsored by the infinitely glorious web development company, Webworks.ie. We’re really quite good.

14 Dec

kfm 0.7 in beta

No versioned release zip yet, but I just finished the last of the features scheduled for this release. You can download via SVN using the details mentioned on the KFM site.

I’ll be announcing the string-freeze to the translators later today. We have one more language this time, Bulgarian. The official release will be in one week’s time. I need to give the translators time to do their work, and also, will spend that time looking through the code for bits that I can make more efficient.

New features for 0.7:

  • you can now upload a zipped archive of a few files, and extract the archive. this allows you to upload a load of files at the same time.
  • there were a lot of problems with SQLite in version 0.6. to help alleviate this problem, KFM now supports MySQL, Postgres and SQLite, using the MDB2 Pear library.
  • instead of returning links which point directly to the requested images/files, we now return a link which retrieves the requested file via KFM. this allows your file repository to be held outside the web root, and will allow file authentication and other tricks (logging, uri-based thumbs) in the future.
  • long directory names are now truncated similar to long file names.
  • “file upload”, and “copy from Internet” now use the same form.
  • lots of speed issues have been fixed.

enjoy. The main release will be next Tuesday. I’ll write up a quick article then detailing what features I think will be in 0.8, 0.9, and on up to 1.0.

If there are any problems using this beta, please mention them using the KFM forum.

In related news, Webworks, my great and glorious company will be using KFM in a very large project next year, which will mean a lot of work will be put into it. I am still committed to providing the improvements to the great unwashed, so you’ll all benefit from our hard work.

15 Nov

mysql modes

I just found out about MySQL “SQL modes”. This allows you to now turn off some of the features that make MySQL so easy to use.

Why would you do that? For the same reason that in Perl, you should use strict, and in PHP, turn off all global variables and GPC variables and turn on all warnings.

In other words, you should lock down your MySQL to not allow any sloppy work. This will train you to write correct code in the future.

What I will be doing is to lock down my home machine to the strictest possible setting, then work on getting my company‘s CMS working with it. Once that’s done, I’ll gradually lock down our development server (can’t just do it immediately, as there are always at least 15 sites in constant development, not all of which I’ve fixed up yet).

What I will /not/ be doing is locking down our production server, for at least a few years! With hundreds of sites of different CMS design, it is not predictable what will work smoothly and what will explode.