HomeThoughts and MusingsThomas M. Tuerke on Design • Globalization


design: /di·'zin/

   n a deliberate plan for the creation or development of an object. vt: to create something according to plan.
   good design: /'gud —/ the product of deliberate forethought and careful understanding of the purpose of a subject, resulting in a subject which significantly improves its utility, allowing it to integrate seamlessly and naturally into the role for which it is intended.
false synonyms: fashion, decor.

Table of Contents [show/hide]
"In French, oeuf means egg, cheese is fromage... it's like those French have a different word for everything!"
- Steve Martin
"In Paris they simply stared when I spoke to them in French; I never did succeed in making those idiots understand their language."
- Mark Twain

One of the things software developers (and designers, too) need to be particularly aware of is the fact that there are many languages out there. "It ain't all English." Moreover, it ain't all like English, either. All too often, the only thing folks in the industry are familiar with is "White Man's ASCII."

In order to develop software that is truly international, it needs to be designed with that intent up front. As with most things, it's often much harder to retrofit those considerations after the fact.

Sections: 2
International Word Order
- Thomas M. Tuerke

One of the first things to be aware of is that different languages order words differently. English, for example, is called an "SVO" language: Subject Verb Object. But other languages use different word orders.

Here's a great little example of how word order differs in different languages. What's shown below are (1) The German text, (2) a word-for-word transliteration of the German text into English words, and (3) the English sentences in proper English word order.

(1) Original German: Die angegebene CGI-Anwendung hat die zulässige Verarbeitungszeit überschritten. Der Server hat den Prozess gelöscht.
(2) Transliterated English: The indicated CGI Application has the permitted working time overstepped. The Server has the process deleted.
(3) Translated English: The indicated CGI Application has exceeded the permitted time limit. The server has deleted the process.

The middle sentence—Transliterated English—should help illustrate why it's not a good idea to use ordinary string manipulation (slicing and concatenating) of strings to build human-readable messages... certainly not for units smaller than a single sentence. German is fairly similar to English, but other languages have more extreme differences.

To illustrate, let's say a German programmer wanted to build a message consisting of subject ("The Server" == "Der Server") a verb ("deleted" == "gelöscht") and an object ("the process" == "den Prozess") which he slams together using concatenation. He could do so for German quite easily. However, if the operation were performed using English strings, you get that middle example. "The Server has the process deleted" sounds as awkward to English readers as doing the same to German using English word order.

In other words, don't do it. The smallest atomic unit of a message string should, in general, be a sentence... a complete thought.

(And yes, this is why "Chinglish" instructions are so "funny"—often times the writers were using Chinese word order when writing the English sentences.)

Sections: 1
More on International Word Order
- Thomas M. Tuerke

The upshot of this—that the smallest unit of text should be the sentence—is that code should not concatenate parts of a sentence with a hard-coded assumption about word order. For example, it would be wrong to code the following:

 sAppointment = sDateTime + " you have an " + sApptType
              + " with " + sClient + "."

The primary reason is that the word order might vary in other languages. For example, some languages might require "With client on datetime you have an appttype."

For this reason, it's better to use the various string formatting functions (such as Windows' FormatMessage, or the .NET runtime's System.String.Format methods) along with your sentence as a format string, and all the variables filled in. For example:

 sAppointment = String.Format("{0} you have an {1} with {2}.", 
                              sDateTime, sApptType, sClient);

Note that printf, from the C runtime, is almost—but not quite—adequate for the job, since the format specifiers (%s, etc) are sequential: you can't rearrange the second specifier to come before the first, without also rewriting the actual printf statement (something you don't want to have to do for each language.)

Once you've gotten over that hurdle, you'll also need to store that format string in an resource somewhere, so the localization engineers can translate that string without having to get into the code, (in the process invalidating all that wonderful testing the code has undergone.)

So, the important thing to remember here: the smallest unit should be the complete sentence, because constituent elements may vary in order, and the source code should be devoid of any explicit human-language strings.

STL pitfalls
- Thomas M. Tuerke

This is ostensibly an accepted pattern for STL: applying a transform over members of a collection:

 transform(sParamValue.begin(), sParamValue.end(),
           sParamValue.begin(), toupper);

The problem is, case folding (shifting to upper, shifting to lower) is not something that can be done "out of context" of the characters around it. It's not acceptable, for example, to do a for loop, iterate characters, and convert them to upper case, which is what the code above does.

In general, any practice of acting on a string as a collection of characters is likely to cause globalization concerns.

Here's an instance (from the Unicode specification):

"Case mappings may produce strings of different length than the original. For example, the German character U+00DF ß Latin Small Letter Sharp S expands when uppercased to the sequence of two characters "SS". This also occurs where there is no precomposed character corresponding to a case mapping, such as with U+0149 N Latin Small Letter N Preceded by Apostrophe."

In short: STL may be good for a lot of things, but it won't (necessarily) help you globalize your code.