utf8 in javascript, php, and mysql

I am working on a large site which uses a lot of Irish-language addresses. This involves quite a lot of fádas (accents) in words. Unfortunately, these characters cause problems when you try interact between JavaScript, PHP and MySQL.
The problem is character sets. JavaScript (actually, ECMAScript) specifies that external JavaScript files must be written in ASCII. ASCII is a 96-character set which does not include such letters as "áíéóú". This poses a problem when you are using, for example, AJAX, to import your data.
A similar problem exists with MySQL, although it’s a bit more complex there.
Anyway – the soution is the utf8_encode() function in PHP. For AJAX and other JavaScript purposes, and also for your MySQL queries, run all strings through utf8_encode() before displaying/committing them, and all should be fine.
Now, that can be a mountain of work to apply to your existing work, unless you’ve been using nicely abstracted packages, such as the Pear DB class and the Sajax package.
For DB, it’s a simple matter to add "$query=utf8_encode($query);" at the beginning of the simpleQuery() function in /usr/local/pear/DB/mysql.php.
For Sajax, change the sajax_esc() function to this:

function sajax_esc($val){
  $val=str_replace(
    array("\\","\r","\n",'"'),
    array("\\\\","\\r","\\n",'\\"'),
    $val
  );
  return utf8_encode($val);
}

And that’s it! You should have no trouble with character encodings again…
…YMMV

9 Comments.

  1. I’ll have less of this encoded mystery meat navigation. Where’s the contact form?

    Looking to hook up with yiz. May even drop by today Saturday the 10th. Should it not suit let me know. We’re in Monaghan this pm.

  2. hey, vinnie – what’s your email address? mail me – kae@verens.com

  3. Hi Kae,

    Are you sure that external JavaScript files have to be sent as ASCII (I realise the ECMA pdf document you linked to says so)?

    I’m currently trying to “internationalise” a calendar script and have had no problems loading external JavaScript files containing ‘raw’ accented characters e.g.

    The external script can contain the following (notice the raw “û” character):

    var somerawstring= “Août”;

    and the variable somerawstring can be used as-is (i.e. without resorting to the UTF8 encoding) e.g.

    someDivReference.appendChild(document.createTextNode(somerawstring));

    Perhaps I’m wrong (I hope not as I’ll have to change a ton of code)…

    Regards,
    Brian

  4. Brian, I think the answer is that browsers do support UTF8 scripts, as long as the header for the .js file specifies that the content is UTF8, and not, for example, 8859-1.

    If you already have a load of code written, then a quick solution may be to add a .htaccess file to the directory your JavaScript, with the following in it (not tested):

    AddCharset utf-8 .js

    Then, you can write the utf-8 text directly into the file (I think 😉 )

  5. wow, thats a fast response my man…

    I just came back to say the same thing. It appears that the HTTP headers have to state a utf-8 charset (which Apache installations do by default I’m lead to believe).

    Thanks for the response,

    Take care,
    Brian

  6. Hi,

    i have also a calendar script which is included into a utf-8 page. Here i have the same troubles that special characters are shown wrong. I tried to put a .htaccess into the folder but without success. What else could i do to solve this problem?
    And when i receive data from my database and place it on the page, it’s working fine, but when i open a popup and add all the parameters via the url, I also get wrong characters… Any idea, what I could do?

    thanks
    Florian

  7. Florian, maybe your site has .htaccess scripts disabled? You can find out by loading the JavaScript up in Firefox – right click and View Page Info. It will tell you what character set the script is in.

  8. cool man.. your utf-8 tips work very well

  9. my ajax code send my utf-8 characters to my page something like: ???????

    it can’t show those characters, I think the problem is somewhere here: document.getElementByIddocument.getElementById

    I am just confused with it. anybody has the solution for using ajax and php to read utf-8 characters from mysql.

    please email me the result: hesame[@]gmail.com

%d bloggers like this: