JavaScript internationalisation, or why Rudolph is more than just a shiny nose
Dunder sat, glumly staring at the computer screen.
“What’s up, Dunder?” asked Rudolph, entering the stable and shaking off the snow from his antlers.
“Well,” Dunder replied, “I’ve just finished coding the new reindeer intranet Santa Claus asked me to do. You know how he likes to appear to be at the cutting edge, talking incessantly about Web 2.0, AJAX, rounded corners; he even spooked Comet recently by talking about him as if he were some pushy web server.
“I’ve managed to keep him happy, whilst also keeping it usable, accessible, and gleaming — and I’m still on the back row of the sleigh! But anyway, given the elves will be the ones using the site, and they come from all over the world, the site is in multiple languages. Which is great, except when it comes to the preview JavaScript I’ve written for the reindeer order form. Here, have a look…”
As he said that, he brought up the order form in French on the screen [same in English].
“Looks good,” said Rudolph.
“But if I add some items,” said Dunder, “the preview appears in English, as it’s hard-coded in the JavaScript. I don’t want separate code for each language, as that’s just silly — I thought about just having if statements, but that doesn’t scale at all…”
“And there’s more, you aren’t displaying large numbers in French properly, either,” added Rudolph, who had been playing and looking at part of the source code:
function update_text() { var hay = getValue('hay'); var carrots = getValue('carrots'); var bells = getValue('bells'); var total = 50 * bells + 30 * hay + 10 * carrots; var out = 'You are ordering ' + pretty_num(hay) + ' bushel' + pluralise(hay) + ' of hay, ' + pretty_num(carrots) + ' carrot' + pluralise(carrots) + ', and ' + pretty_num(bells) + ' shiny bell' + pluralise(bells) + ', at a total cost of <strong>' + pretty_num(total) + '</strong> gold pieces. Thank you.'; document.getElementById('preview').innerHTML = out; } function pretty_num(n) { n += ''; var o = ''; for (i=n.length; i>3; i-=3) { o = ',' + n.slice(i-3, i) + o; } o = n.slice(0, i) + o; return o; } function pluralise(n) { if (n!=1) return 's'; return ''; }
“Oh, botheration!” cried Dunder. “This is just so complicated.”
“It doesn’t have to be,” said Rudolph, “you just have to think about things in a slightly different way from what you’re used to. As we’re only a simple example, we won’t be able to cover all possibilities, but for starters, we need some way of providing different information to the script dependent on the language. We’ll create a global i18n object, say, and fill it with the correct language information. The first variable we’ll need will be a thousands separator, and then we can change the pretty_num function to use that instead:
function pretty_num(n) { n += ''; var o = ''; for (i=n.length; i>3; i-=3) { o = i18n.thousands_sep + n.slice(i-3, i) + o; } o = n.slice(0, i) + o; return o; }
“The i18n object will also contain our translations, which we will access through a function called _() — that’s just an underscore. Other languages have a function of the same name doing the same thing. It’s very simple:
function _(s) { if (typeof(i18n)!='undefined' && i18n[s]) { return i18n[s]; } return s; }
“So if a translation is available and provided, we’ll use that; otherwise we’ll default to the string provided — which is helpful if the translation begins to lag behind the site’s text at all, as at least something will be output.”
“Got it,” said Dunder. “_(‘Hello Dunder’) will print the translation of that string, if one exists, ‘Hello Dunder’ if not.”
“Exactly. Moving on, your plural function breaks even in English if we have a word where the plural doesn’t add an s — like ‘children’.”
“You’re right,” said Dunder. “How did I miss that?”
“No harm done. Better to provide both singular and plural words to the function and let it decide which to use, performing any translation as well:
function pluralise(s, p, n) { if (n != 1) return _(p); return _(s); }
“We’d have to provide different functions for different languages as we employed more elves and got more complicated — for example, in Polish, the word ‘file’ pluralises like this: 1 plik, 2–4 pliki, 5–21 plików, 22–24 pliki, 25–31 plików, and so on.” [more information on plural forms]
“Gosh!”
“Next, as different languages have different word orders, we must stop using concatenation to construct sentences, as it would be impossible for other languages to fit in; we have to keep coherent strings together. Let’s rewrite your update function, and then go through it:
function update_text() { var hay = getValue('hay'); var carrots = getValue('carrots'); var bells = getValue('bells'); var total = 50 * bells + 30 * hay + 10 * carrots; hay = sprintf(pluralise('%s bushel of hay', '%s bushels of hay', hay), pretty_num(hay)); carrots = sprintf(pluralise('%s carrot', '%s carrots', carrots), pretty_num(carrots)); bells = sprintf(pluralise('%s shiny bell', '%s shiny bells', bells), pretty_num(bells)); var list = sprintf(_('%s, %s, and %s'), hay, carrots, bells); var out = sprintf(_('You are ordering %s, at a total cost of <strong>%s</strong> gold pieces.'), list, pretty_num(total)); out += ' '; out += _('Thank you.'); document.getElementById('preview').innerHTML = out; }
“sprintf is a function in many other languages that, given a format string and some variables, slots the variables into place within the string. JavaScript doesn’t have such a function, so we’ll write our own. Again, keep it simple for now, only integers and strings; I’m sure more complete ones can be found on the internet.
function sprintf(s) { var bits = s.split('%'); var out = bits[0]; var re = /^([ds])(.*)$/; for (var i=1; i<bits.length; i++) { p = re.exec(bits[i]); if (!p || arguments[i]==null) continue; if (p[1] == 'd') { out += parseInt(arguments[i], 10); } else if (p[1] == 's') { out += arguments[i]; } out += p[2]; } return out; }
“Lastly, we need to create one file for each language, containing our i18n object, and then include that from the relevant HTML. Here’s what a blank translation file would look like for your order form:
var i18n = { thousands_sep: ',', '%s bushel of hay': '', '%s bushels of hay': '', '%s carrot': '', '%s carrots': '', '%s shiny bell': '', '%s shiny bells': '', '%s, %s, and %s': '', 'You are ordering %s, at a total cost of <strong>%s</strong> gold pieces.': '', 'Thank you.': '' };
“If you implement this across the intranet, you’ll want to investigate the xgettext program, which can automatically extract all strings that need translating from all sorts of code files into a standard ‘.po’ file (I think Python mode works best for JavaScript). You can then use a different program to take the translated .po file and automatically create the language-specific JavaScript files for us.” [e.g. German .po file for PledgeBank, mySociety’s .po–.js script, example output]
With a flourish, Rudolph finished editing. “And there we go, localised JavaScript in English, French, or German, all using the same main code.”
“Thanks so much, Rudolph!” said Dunder.
“I’m not just a pretty nose!” Rudolph quipped. “Oh, and one last thing — please comment liberally explaining the context of strings you use. Your translator will thank you, probably at the same time as they point out the four hundred places you’ve done something in code that only works in your language and no-one else’s…”
[ Thanks to Tim Morley and Edmund Grimley Evans for the French and German translations respectively. ]