Locales
Contents |
About
There are many reasons to build a UTF-8-enabled system. Perhaps reason enough is that UTF-8 is the encoding of the future -- everybody is switching to UTF-8, and for good reason: with UTF-8, it's possible to represent truly every language in the world in a single charset. Operating systems are beginning to standardize on UTF-8 -- all versions of Mac OS X use UTF-8 internally, for example. To compatibly share text files with Mac OS X, you need to be using UTF-8, and we expect other operating systems to follow suit.
English-speaking users may not immediately see the benefit of UTF-8, as they may think they have no need for the vast array of special characters UTF-8 offers. But in most every other language, special characters are used frequently. And in today's highly international world, you may very well want to be able to represent, say, a Japanese word -- a song title, perhaps, or the name of a television show or movie. Or maybe you'd like for a person whose name contains non-Latin characters -- perhaps a Polish or Russian name -- to be able to represent their name properly. Just a thought. Even those using purely the English language will benefit from switching to UTF-8, since with it you can represent many additional wonderful and useful typographical symbols that are not available in the ASCII or ISO-8859-1 charsets.
The locales a user can choose from are built by the glibc. Usually all available locales starting from aa_DJ (Afar locale for Djibouti) over en_US (English locale for the USA) to zu_ZA.utf8 (Zulu locale for South Africa) will be installed. Unless you're working at the UN and administer a central server for all member states, it is difficult to conceive why you would need a system where all of these locales are installed.
Preparing the locales
As first step, you must know exactly which locales you need to use. All available locales can be found at /usr/share/i18n/locales and /usr/share/i18n/SUPPORTED.
Usually, the locale is the code of the language (e.g. en), followed by an underscore (_) and by the capital code of the country (US). For example en_US.
For European countries which uses new euro currency, the @euro string must be appended to the locale ID. If I don't say it explicitly, when I say locale ID, I intend the one without the euro specification.
For European countries which use euro currency, you should add two different locales, first the one without the euro specification, and then the one with it. For countries using the euro currency, the UTF-8 charmap should support the EUR symbol out of the box, there should be no need for it to be separately defined. This is not true for non-UTF-8 charmaps however, as shown in the example above for the de_DE@euro definition.
| File: /etc/locale.gen |
# /etc/locale.gen: list all of the locales you want to have on your system # # The format of each line: # <locale> <charmap> # # Where <locale> is a locale located in /usr/share/i18n/locales/ and # where <charmap> is a charmap located in /usr/share/i18n/charmaps/. # # All blank lines and lines starting with # are ignored. # # For the default list of supported combinations, see the file: # /usr/share/i18n/SUPPORTED # # Whenever glibc is emerged, the locales listed here will be automatically # rebuilt for you. After updating this file, you can simply run `locale-gen` # yourself instead of re-emerging glibc. en_US ISO-8859-1 en_US.UTF-8 UTF-8 en_GB ISO-8859-1 en_GB.UTF-8 UTF-8 de_DE ISO-8859-1 de_DE@euro ISO-8859-15 de_DE.UTF-8 UTF-8 |
Create the locales
The system locales come with the glibc package. By default almost all possible locales are installed, though you can choose to install only the locales you need. You can delete /etc/locales.build after upgrading to the latest version of glibc.
Whenever glibc is emerged, the locales listed here will be automatically rebuilt for you. After updating this file, you can simply run locale-gen yourself instead of re-emerging glibc.
An alternative to locale-gen is to manually create the *.UTF-8 locales. This can be done by using localedef.
| Code: |
English (US): localedef -c -f UTF-8 -i en_US en_US.UTF-8 English (GB): localedef -c -f UTF-8 -i en_US en_GB.UTF-8 German (DE): localedef -c -f UTF-8 -i de_DE de_DE.UTF-8 |
locale -a will give you a list of all installed locales. Note that in the output, utf8 will be weighted like as UTF-8.
At this point you can add userlocales USE flag to global flags (or to sys-libs/glibc specific flags), and re-emerge glibc.
Setting up locales
To get a list of all UTF-8 supported locales, check the output of:
grep UTF-8 /usr/share/i18n/SUPPORTED
Lines in /usr/share/i18n/SUPPORTED have the format:
<locale> <charmap>
You only need the <locale> part for your /etc/env.d/02locale.
Now you need to modify your LANG and LC_ALL variables in /etc/env.d/02locale for specifying the locales to use. Note that /etc/env.d/02locale is case-sensitive.
| File: /etc/env.d/02locale |
LANG=de_DE.UTF-8 LC_ALL=de_DE.UTF-8 |
In this case the language is set as for the locales (numeric values, currencies, and so on) to the German one.
Please note that you must use the same charset encoding for LANG and LC_* values.
You can't use LC_ALL, because it will overwrite LC_MONETARY, no matter if LC_MONETARY is specified after or before LC_ALL. But some programs complain (see [1]) if LC_ALL isn't set.
For European countries with euro currency, you must change this. Theoretically you should use something like it_IT.UTF-8@euro, but doing this makes X complain. To workaround this, you can use this:
LANG=de_DE.UTF-8 LC_MONETARY=de_DE.UTF-8@euro
This way you use the added euro support only for monetary data, which is what was changed, and X will not complain because is unable to find out the locale.
If you only want to use the German charset (for umlauts and special characters) but want all messages displayed in English, you can change /etc/env.d/02locale as follows:
| File: /etc/env.d/02locale |
LANG="de_DE.UTF-8" LC_MESSAGES="en_US.utf8" |
Now you should run env-update && source /etc/profile, or restart the system (or logout-login and restart every service in the system), and you'll be using UTF-8 for everything. Then logout of your X session, restart X with Ctrl-Alt-Backspace, and log back into X again.
Setting the beginning of the week
When I look at a calendar I like the weeks to start on a Monday. GNOME has a neat feature that if you click on the Clock Applet in the Panel it shows the current month's calendar with all your events from Evolution but it shows the week starting on Saturday and there is no option in the preferences to make it start on a different day.
Why does the Clock Applet have no setting for "First day of the Week"?
There did used to be a setting in the Clock Applet, but some releases ago it was removed, and the applet was modified to pick up the user's preference from the system-wide locale setting. Let us pass over the wisdom of this decision, it happened. Too bad. The unfortunate fact is that the default locale setting has the first day of the week as a Saturday, apparently due to a bug in the glibc supplied locale files (see more below).
The way to change the first day of the week in the Clock's calendar view is therefore to set the correct system locale. Too bad if you don't like the "correct" choice!
Why is the Clock Applet still not right?
After you do that you can check what locale you are running with the "locale" command. Assuming it says what you set it to above, then everything has worked fine. However, you will find that the Clock's calendar view now shows weeks starting on a Sunday. This is a marginal improvement on starting on a Saturday, and if you like them like that well good for you, and you can stop now. If, like me, you emphatically hate them like that, read on.
After some research it became clear that the poor old Gnome Clock Applet was not at fault any more, but that the default en_GB locale file supplied with glibc really does set the first day of the week to be a Sunday. Lets explore how it does that, then see how to change it.
LC_TIME settings
One of the sub-sections of your locale is for settings that affect "strftime" and other utilities that format dates and times. This subsection is called LC_TIME. If you do locale -k LC_TIME on your system you will see what LC_TIME controls. For our purposes the following settings are the important ones:
| Code: locale -k LC_TIME |
day "Sunday; Monday; Tuesday; Wednesday; Thursday; Friday; Saturday" % below: ndays;1stday;1stweek week 7;19971130;4 first_weekday 1 |
This gives the names of the seven days of the week, says (redundantly) that there are seven of them, and then that the first one listed corresponds to the name of the day of the week of 1997-11-30 (which was a Sunday). Finally and crucially, the first day of the week is defined to be the first entry in the list of day names given.
For the UK, this is wrong! In the UK weeks start on Monday, following the ISO calendar standards.
There are two ways to express what it should be: either rearrange the day names or set first_weekday to 2. Either of these will do the trick.
day "Monday; Tuesday; Wednesday; Thursday; Friday; Saturday; Sunday" % below: ndays;1stday;1stweek week 7;19971201;4 first_weekday 1
day "Sunday; Monday; Tuesday; Wednesday; Thursday; Friday; Saturday" % below: ndays;1stday;1stweek week 7;19971130;4 first_weekday 2
but the second one involves less editing so we'll use that. Now, how do you make this change? Read on.
Create the locale
The tool you need is the rather nasty-to-use "localedef". Proceed as follows.
cd ~ # go to your home dir or somewhere else you can write a temp file cp /usr/share/i18n/locales/en_GB en_GB_modified
Then edit "en_GB_modified". The locale (5) man page explains the format of the file.
1. Update the identification section if you like 2. Update all the sections except LC_TIME to be like this:
LC_CURRENCY copy "en_GB" END LC_CURRENCY
etc. This ensures that you don't lose anything in the other sections.
3. Add the following lines to the LC_TIME section
week 7;19971130;5 first_weekday 2 first_workday 2
This assumes that you are going to leave the list of day names alone. The "5" on the end of the week line says that the first week of the year is the one containing the first occurence of the fifth day in the list (Thursday), this is the ISO rule. The other two lines set the first day of the week to the second day in the list (Monday) and incidentally the first working day to be Monday too.
4. Create a new directory for the output of locale def in your home directory.
cd ~ && mkdir locale
5. Run
localedef -c -i en_GB_modified -f UTF-8 locale/en_GB.utf8@iso
6. (as root) move the resulting "en_GB.utf8@iso" directory from your own "locale" directory to "/usr/lib/locale"
cp -r locale/en_GB.utf8\@iso /usr/lib/locale/
6. (still as root) Update /etc/env.d/02locale to point at the new "en_GB.UTF8@iso" locale that you have created.
LANG=en_GB.utf8@iso LC_ALL=en_GB.utf8@iso
and then run
env-update && source /etc/profile
Then logout of your X session, restart X with Ctrl-Alt-Backspace, and log back into X again. If you are lucky then everything is now working, and your Gnome Clock Applet shows weeks starting on Monday.
Issues
There are always a few issues. Currently there are a few unknowns and problems with all this:
- The "week" and "first_weekday" keywords in LC_TIME are not documented in any standard Linux documentation; the only available documentation for them is the ISO 14652 standard.
- It would be better if glibc could correct their en_GB defintions to put Monday as the first day of the week.
- It would be even better if applications like the Clock Applet allowed users to choose their own preferred first day of the week. The assumption that the default for the locale will be what the user wants is a bad assumption.
- X11 has its own locale system that is almost-but-not-entirely compatible with glibc locales. If you follow the instructions above (and like me are using Gnome) then you will get warnings like these when you start certain X11 applications.
Gdk-WARNING **: locale not supported by Xlib Gdk-WARNING **: cannot set locale modifiers
These are irritating and potentially harmful. To avoid this effect you can create your modified locale with standard name. You can do this with:
localedef -c -i en_GB_modified -f UTF-8 locale/en_GB.utf8
or (if you don't want Unicode)
localedef -c -i en_GB_modified -f ISO-8859-1 locale/en_GB
- Your modified /usr/lib/locale/ directory will get replaced when you upgrade anything to do with libc or GNOME, so you will need to do all this again.
- localedef is an obscure and poorly documented tool. For example "man localedef" states rather hopelessly that "The format of the created output is unspecified." It also says "If the name operand does not contain a slash, then the existence of an output file for the locale is unspecified". So if you do localedef -c -i en_GB_modified -f UTF-8 en_GB.utf8@iso (notice that we did not specify a local directory to write to on the command line) instead of creating the output file locally it will mysteriously fail. If you do this as root, it will write your output in /usr/lib/locale/locale-archive. You can still use the locale you created, but you may puzzle over where the system put the output.
Additional tools
localepurge
An interesting tool is app-admin/localepurge that cleans out any installed man-page or info-file in languages you do not need on your system. You should read the man-page to localepurge in any case and configure languages you intend to keep in /etc/locale.nopurge. Please read the man-pages before running localepurge.
Notes
glibc 2.3.6 r4 and above
Since glibc-2.3.6-r4 the uselocales flag has been removed and /etc/userlocales.build has been replaced with /etc/locale.gen. All you need to do is simply specify which locales you want in /etc/locale.gen and then run locale-gen.
The format is basically the same as locales.build, except the / is replaced with a space. The following example shows a British system with the ISO-8859-1 and UTF-8 character sets enabled.
| File: /etc/locale.gen |
en_GB ISO-8859-1 en_GB.UTF-8 UTF-8 |
If you are using glibc <2.3.6 r4, add the userlocales USE flag to the glibc package. You can do this by adding it to /etc/portage/package.use with: echo "sys-libs/glibc userlocales" >> /etc/portage/package.use
Now specify the locales you want to be able to use in /etc/locales.build as in the example below.
| File: /etc/locales.build |
en_US/ISO-8859-1 en_US.UTF-8/UTF-8 de_DE/ISO-8859-1 de_DE@euro/ISO-8859-15 de_DE.UTF-8/UTF-8 |
The entries in the file are, like the header of the file suggests, in this format:
<locale>/<charmap>
<locale> is a locale from the /usr/share/i18n/locales directory and <charmap> is the name of one of the files in /usr/share/i18n/charmaps/.
Troubleshooting
Undefined locales
If you need to install a locale not yet defined in /usr/share/i18n/locales, please refer to the Gentoo official UTF-8 documentation.
See also
For further information about locale-handling read the Gentoo Linux Localization Guide.
Created by NickStallman.net, Luxury Homes Australia
Real estate agents should be using interactive floor plans and real estate agent tools.
