Archive for April, 2008

Almost every time someone says to me that something I was working on is broken, the answer is “clear your cache”, and that automagically fixes everything. However, that’s not an ideal solution - ideally, the problem would never happen in the first place.

So, why does the problem happen?

Let’s say that there is a HTML file which calls a JS function like this: showImage(); - the JS function is included from the external file /j/images.js.

Browsers are usually set to cache .js files, and that’s the correct behaviour for the most part. Unfortunately, when a file needs to be fixed, it can cause problems.

For example, let’s say that I’ve corrected the function name to match my usual naming scheme - images_show();. I change the reference in both places. The browser reads the new HTML file from the net, but loads the JavaScript from the cache - suddenly there’s a mismatch which causes a problem.

So, how to get around this?

The solution I’m using at the moment involves a little bit of mod_rewrite and PHP.

Sticking with the contrived example, let’s rewrite /j/images.js so it is accessible from /j/images (using /.htaccess):

ExpiresActive On
ExpiresDefault A259200
RewriteEngine on
RewriteRule ^j/images$ /j/images.js [L]

Now, we add a little magic. We want to change the URL if the file has changed. The only way to know this is to look at the modified date of the file.

In your PHP, you could do it like this:

<script type="text/javascript" src="/j/images/<php? echo md5(`ls -l j/images.js`); ?>"></script>

and then change the .htaccess file to allow that:

ExpiresActive On
ExpiresDefault A259200
RewriteEngine on
RewriteRule ^j/images/(.*)$ /j/images.js [L]

Now, if no file changes happen, then the MD5 hash (and therefore the URL) will be cacheable, and if the file changes, then the URL will automatically change as well.

…and that’s not all!

I like to aggregate my JavaScript files to reduce the network pain felt by the browser. In my CMS, it’s done with a /j/js.php file. Here’s a short excerpt:

<?php
$js=file_get_contents('jquery-1.2.3.min.js');
$js.=file_get_contents('js.js');
$js.=file_get_contents('tabs.js');
$js.=file_get_contents('addrow.js');
$js.=file_get_contents('formhide.js');
/* more files */

header('Cache-Control: max-age=2592000');
header('Expires-Active: On');
header('Expires: Fri, 1 Jan 2500 01:01:01 GMT');
header('Pragma:');
header('Content-type: text/javascript; charset=utf-8');

echo $js;

That’s then pointed to with this line in my .htaccess:

RewriteRule ^js/(.*)$ /j/js.php [L]

And it’s referenced in the browser like this:

echo '<script type="text/javascript" src="/js/'.md5(`ls -l j`).'"></script>';

Simple, innit! That simple trick now keeps track of a number of files, and the browser knows immediately if there are any changes.

BTW: The same trick can be used with images, css, and any number of other “static” objects.

3am, daughter wide awake, not allowed to sleep - what’s a guy to do? Let’s do an experiment.

Let’s say we want to efficiently select all points in an area from a database. This has real-world applications - I’ll be using it in a geographical project very soon.

First, create a simple table in MySQL. I’ve created mine in a database called ‘geodb’.

CREATE TABLE points(x INT, y INT, INDEX(x), INDEX(y));

Note that x and y are both indexed.

Then, seed that table using some PHP (I had to up the max_execution_time on my laptop to 300 for this).

<?php
mysql_connect('localhost','username','password');
mysql_select_db('geodb');
for($i=0;$i<10000000;++$i){
  $x=rand(-1000000,1000000);
  $y=rand(-1000000,1000000);
  mysql_query("INSERT INTO points (x,y) VALUES ($x,$y)");
}

Ok. For the rest of the experiment, we’ll be trying various ways to extract the number of points within a radius of 100000 from (0,0). The goal is to have the lowest working time. Each method should return the exact same result.

First, from the console, do a straight select statement.

time echo "SELECT COUNT(x) FROM points WHERE SQRT(x*x+y*y)<100000" | mysql -uusername -ppassword geodb

Returned result 7993 in .414 seconds. Wow - pretty quick already… but, my daughter is still awake, so let’s continue.

We can improve this by avoiding the math on points that we are certain can not be in the area. For example, in a radius of 100000 from (0,0), and points with x<-100000, x>100000, y<-100000, y>100000 can definitely not be in the circle.

time echo "SELECT COUNT(x) FROM (SELECT x,y FROM points WHERE x>-100000 AND x<100000 AND y>-100000 AND y<100000) AS sub1 WHERE SQRT(x*x+y*y)<100000" | mysql -uusername -ppassword geodb

.326 seconds. Better. However, there are two calculations being performed on each value. Let’s reduce that.

time echo "SELECT COUNT(x) FROM (SELECT x,y FROM points WHERE ABS(x)<=100000 AND ABS(y)<=100000) AS sub1 WHERE SQRT(x*x+y*y)<100000" | mysql -uusername -ppassword geodb

.297 - more than 25% fster than the original.

It should be possible to reduce that further. For example, we can be certain that x,y values which are both less than (100000*Cos(Pi/4)) are contained inside the circle, so that’s another < comparison, reducing the number of maths operations. I’d test that one as well, but my daughter is finally asleep in my left arm as type.

As some of you know, my KFM project is available in 17 different languages. I did that using a home-brewed translation method.

Recently, I’ve been working on translating our CMS, WebME. I took a more “official” approach this time, and looked through the PHP documentation.

This article is not about localisation, in that I don’t care at the moment about the difference between en_GB and en_US (other than to point out that Americans spell everything wrong). It’s about translation itself.

The “official” way to do translations is to use the compiled-in gettext support. An un-official, but very popular alternative is to use the PHP-gettext project, which is used by WordPress.

I chose to use the compiled-in gettext support. As I control the server that our CMS is run on, there is no support problem, so I can guarantee that required software will be there.

Another reason for using the compiled-in version vs the PHP library is that, in the words of PHP-gettext’s developer, “I’m not very fond of PHP the language, there’ll be a lot to fix”. In other words, there is a chance that the PHP-gettext library may blow up, and the developer may just shrug his shoulders.

So, how does localisation work? To start off, you need at least one language other than the main language of your site (presumably it’s English). For testing purposes, let’s assume it’s Irish

Translation is managed in “domains” (think “namespaces”). That’s not extremely important if you are only translating a few hundred strings, as you can just use one called “default”.

Language strings are recorded in .mo files. You don’t need a file for your main language.

The files are saved in a directory on your server, using this structure:

/path/to/locales/
        ga/
            LC_MESSAGES/
                default.mo
        de/
            LC_MESSAGES/
                default.mo

The “locales” directory can be named anything you want, and the “default” bits are named after your domain (namespace).

The ga/de directories here should properly be ga_IE and de_DE, but we’re not interested in the locale part - we’re only interested in the language part.

To create your .mo files, first create a file in your server called test.php:

<php
header('Content-type: text/html; Charset=utf-8');
setLocale(LC_ALL,'ga_IE.utf8');
binddomain('default','/path/to/locales'); // change this to your locales directory
textdomain('default');

echo _('Pages');

Note that I’ve used ‘ga_IE.utf8′ here instead of ‘ga’. I’ll explain that later - it’s important for now.

When you run that, it should output “Pages”. The next step is to write the translation for it.

To create a .mo file, I recommend poedit - it’s cross-platform and works well. I won’t get into too much detail - here is an excellent tutorial - just replace ‘__’ with ‘_’. btw, the Irish for “Pages” is “Leathanaí”

When the file is created, save it as “/path/to/locales/ga/LC_MESSAGES/default.mo”, and restart your webserver (Apache caches gettext strings, so when you change them, you may need to restart the webserver to clear the cache - YMMV).

Now, when you try your script, it should output “Leathanaí”. Simple, innit?

Actually, no. You see, that’s a very contrived version - we deliberately chose a working locale. However, you will want to grab the locale from the browser’s Accept header, and that is not guaranteed to work.

Try it yourself - replace ‘ga_IE.utf8′ with ‘ga’ - the output is now “Pages”. btw, that’s why you don’t need a translation for your main language - gettext will output it’s input if there is no existing translation.

So, how do we get a working locale from the browser?

First, you need a list of your existing languages:

  if ($handle = opendir('/path/to/locales')) {
      $files = array();
      while(false!==($file = readdir($handle)))if (is_dir('/path/to/locales/'.$file))$files[] = $file;
      closedir($handle);
      sort($files);
      $available_languages = array();
      foreach($files as $f)$available_languages[] = $f;
  } else {
      echo 'error: missing language files';
      exit;
  }

Next, parse the browser’s Accept header for a locale which matches what we have.

  $ls=array();
  if (!isset($_SERVER['HTTP_ACCEPT_LANGUAGE']))$_SERVER['HTTP_ACCEPT_LANGUAGE'] = '';
  $langs = explode(',', $_SERVER['HTTP_ACCEPT_LANGUAGE']);
  foreach($langs as $lang)if (in_array(preg_replace('/;.*/','',trim($lang)), $available_languages)) {
    $selected_language= preg_replace('/;.*/','',trim($lang));
    break;
  }
  if(!isset($selected_language))$selected_language='en';

Note that we default to ‘en’.

And now, we properly set up the locale.

  if(!setLocale(LC_ALL,$selected_language)){
    preg_match_all("/[^|\w]".$selected_language.'.*/',`locale -a`,$matches);
    if(!count($matches[0]))die('no locale info for "'.$selected_language.'"');
    $selected_language=trim($matches[0][0]);
    foreach($matches[0] as $m)if(preg_match('/utf8/',$m)){
      $selected_language=trim($m);
      break;
    }
    setLocale(LC_ALL,$selected_language);
  }

What’s happening here is that we are scanning the webserver’s list of compiled locales (not your list), as gettext will not work unless it has a properly defined locale which is already compiled on the server. The above code will, when given ‘ga’, find ‘ga_IE.utf8′ in the server’s locales list and use that.

After that, it’s just the normal domain bind, as described in the simple example.

header('Content-type: text/html; Charset=utf-8');
binddomain('default','/path/to/locales'); // change this to your locales directory
textdomain('default');

echo _('Pages');

So now you can translate in just about any language, whether the browser supplies a proper locale string or not.

But wait - there’s more! What if you want to translate the string “Welcome to the ‘$1′ page”, where $1 is a variable?

Unfortunately, PHP’s built-in gettext function can’t do that. But we can hack it together easily. Add this function definition to your script:

function __($string)
{
    $str = gettext($string);
    for($i = func_num_args()-1 ; $i ; --$i){
        $s=func_get_arg($i);
        $str=str_replace('%'.$i,$s,$str);
    }
    return $str;
}

Then replace your _(’Page’) with __(”Welcome to the ‘$1′ page”,”Kaetastic”). You will need to change the “keywords” section of poedit to use ‘__’ as well as ‘_’, then rescan the PHP files, update your .mo and restart the webserver.

Brilliant! Now you have a proper multi-lingual site. I expect throngs of readers to browse your site in Klingon and Leet.

Note that for proper optimisation you should use __() instead of _() only when there are multiple parameters - otherwise it’s quicker just to use _().