03 Apr

Translation in PHP

As some of you know, my KFM project is available in 17 different languages. I did that using a home-brewed translation method.

Recently, I’ve been working on translating our CMS, WebME. I took a more “official” approach this time, and looked through the PHP documentation.

This article is not about localisation, in that I don’t care at the moment about the difference between en_GB and en_US (other than to point out that Americans spell everything wrong). It’s about translation itself.

The “official” way to do translations is to use the compiled-in gettext support. An un-official, but very popular alternative is to use the PHP-gettext project, which is used by WordPress.

I chose to use the compiled-in gettext support. As I control the server that our CMS is run on, there is no support problem, so I can guarantee that required software will be there.

Another reason for using the compiled-in version vs the PHP library is that, in the words of PHP-gettext’s developer, “I’m not very fond of PHP the language, there’ll be a lot to fix”. In other words, there is a chance that the PHP-gettext library may blow up, and the developer may just shrug his shoulders.

So, how does localisation work? To start off, you need at least one language other than the main language of your site (presumably it’s English). For testing purposes, let’s assume it’s Irish

Translation is managed in “domains” (think “namespaces”). That’s not extremely important if you are only translating a few hundred strings, as you can just use one called “default”.

Language strings are recorded in .mo files. You don’t need a file for your main language.

The files are saved in a directory on your server, using this structure:

/path/to/locales/
        ga/
            LC_MESSAGES/
                default.mo
        de/
            LC_MESSAGES/
                default.mo

The “locales” directory can be named anything you want, and the “default” bits are named after your domain (namespace).

The ga/de directories here should properly be ga_IE and de_DE, but we’re not interested in the locale part – we’re only interested in the language part.

To create your .mo files, first create a file in your server called test.php:

<php
header('Content-type: text/html; Charset=utf-8');
setLocale(LC_ALL,'ga_IE.utf8');
binddomain('default','/path/to/locales'); // change this to your locales directory
textdomain('default');

echo _('Pages');

Note that I’ve used ‘ga_IE.utf8’ here instead of ‘ga’. I’ll explain that later – it’s important for now.

When you run that, it should output “Pages”. The next step is to write the translation for it.

To create a .mo file, I recommend poedit – it’s cross-platform and works well. I won’t get into too much detail – here is an excellent tutorial – just replace ‘__’ with ‘_’. btw, the Irish for “Pages” is “Leathanaí”

When the file is created, save it as “/path/to/locales/ga/LC_MESSAGES/default.mo”, and restart your webserver (Apache caches gettext strings, so when you change them, you may need to restart the webserver to clear the cache – YMMV).

Now, when you try your script, it should output “Leathanaí”. Simple, innit?

Actually, no. You see, that’s a very contrived version – we deliberately chose a working locale. However, you will want to grab the locale from the browser’s Accept header, and that is not guaranteed to work.

Try it yourself – replace ‘ga_IE.utf8’ with ‘ga’ – the output is now “Pages”. btw, that’s why you don’t need a translation for your main language – gettext will output it’s input if there is no existing translation.

So, how do we get a working locale from the browser?

First, you need a list of your existing languages:

  if ($handle = opendir('/path/to/locales')) {
      $files = array();
      while(false!==($file = readdir($handle)))if (is_dir('/path/to/locales/'.$file))$files[] = $file;
      closedir($handle);
      sort($files);
      $available_languages = array();
      foreach($files as $f)$available_languages[] = $f;
  } else {
      echo 'error: missing language files';
      exit;
  }

Next, parse the browser’s Accept header for a locale which matches what we have.

  $ls=array();
  if (!isset($_SERVER['HTTP_ACCEPT_LANGUAGE']))$_SERVER['HTTP_ACCEPT_LANGUAGE'] = '';
  $langs = explode(',', $_SERVER['HTTP_ACCEPT_LANGUAGE']);
  foreach($langs as $lang)if (in_array(preg_replace('/;.*/','',trim($lang)), $available_languages)) {
    $selected_language= preg_replace('/;.*/','',trim($lang));
    break;
  }
  if(!isset($selected_language))$selected_language='en';

Note that we default to ‘en’.

And now, we properly set up the locale.

  if(!setLocale(LC_ALL,$selected_language)){
    preg_match_all("/[^|\w]".$selected_language.'.*/',`locale -a`,$matches);
    if(!count($matches[0]))die('no locale info for "'.$selected_language.'"');
    $selected_language=trim($matches[0][0]);
    foreach($matches[0] as $m)if(preg_match('/utf8/',$m)){
      $selected_language=trim($m);
      break;
    }
    setLocale(LC_ALL,$selected_language);
  }

What’s happening here is that we are scanning the webserver’s list of compiled locales (not your list), as gettext will not work unless it has a properly defined locale which is already compiled on the server. The above code will, when given ‘ga’, find ‘ga_IE.utf8’ in the server’s locales list and use that.

After that, it’s just the normal domain bind, as described in the simple example.

header('Content-type: text/html; Charset=utf-8');
binddomain('default','/path/to/locales'); // change this to your locales directory
textdomain('default');

echo _('Pages');

So now you can translate in just about any language, whether the browser supplies a proper locale string or not.

But wait – there’s more! What if you want to translate the string “Welcome to the ‘$1’ page”, where $1 is a variable?

Unfortunately, PHP’s built-in gettext function can’t do that. But we can hack it together easily. Add this function definition to your script:

function __($string)
{
    $str = gettext($string);
    for($i = func_num_args()-1 ; $i ; --$i){
        $s=func_get_arg($i);
        $str=str_replace('%'.$i,$s,$str);
    }
    return $str;
}

Then replace your _(‘Page’) with __(“Welcome to the ‘$1’ page”,”Kaetastic”). You will need to change the “keywords” section of poedit to use ‘__’ as well as ‘_’, then rescan the PHP files, update your .mo and restart the webserver.

Brilliant! Now you have a proper multi-lingual site. I expect throngs of readers to browse your site in Klingon and Leet.

Note that for proper optimisation you should use __() instead of _() only when there are multiple parameters – otherwise it’s quicker just to use _().