This explains how to type the characters from encodings for some language’s alphabets in Linux.
A much more thorough listing is to be found in Sander van Geloven’s Compose Key Sequences Reference Guide 2012 (Hellebaard, Utrecht).
The data in a document can be thought of as a string of 0’s and 1’s: binary numbers. These numbers are always grouped into bytes which is a set of 8 bits. A byte can be interpreted as a number from 0 to 255.
A character encoding is a re-interpretation of these numbers as characters from human language alphabets.
The most famous encoding is the old ASCII encoding, which interprets numbers in the range 33–126 as characters found on the keys of an American typewriter. (The remaining characters 0–33 and 127 are used as device control characters: carriage return, tab, newline, delete, etc.) ASCII uses only the first half of the possible numbers in a byte, so it is called a “7-bit” encoding. This encoding only barely suffices for American English, however. Many other encodings have been developed to encode other languages.The first 127 characters of most (but not all) encodings are ASCII. This is a good thing because it means that the US English part of any text can be understood (almost) independently of the encoding.
The character encoding to end all character encodings is Unicode, whose goal is to encode the writing systems of all the world’s languages (as well as many other symbolic systems). Variants of Unicode are UTF-8, UTF-16, and ISO-10646. To do this, it is insufficient to associate the 255 different bytes with characters, so Unicode uses multiple bytes to encode characters. It is therefore called a multi-byte encoding.
Computer systems have mostly completed the move to Unicode. There are many weak spots, and older systems often don't support it at all.
Before Unicode, dozens of other encoding systems were developed. Most of them were 8-bit encodings, specialized for a particular language, or for some small group of languages. These include the international standard ISO-9959 series, which enable one to type in both English and some other language or languages. Then there were series of encodings meant for a particular computer architecture, such as the PC CodePage (CP) series for IBM PC compatibles and Microsoft Windows, and the Macintosh encodings.
An 8-bit encoding is insufficient for ideogrammatic writing systems such as Chinese, Japanese, and Korean. So multi-byte encoding systems were developed for these: Big-5, JIS, and KSC.
A key code identifies a particular key on the computer keyboard. Unfortunately, different manufacturers numbered their keyboards in incompatible ways. However, with knowledge of your keyboard brand and model, X-windows can sort this out.
For English, we have to use the shift key to obtain upper-case letters. In other languages, it is necessary to use other combinations of keys to obtain accent marks. In many languages, complex combinations of keys may be required to obtain a given character. For a given writing system, an input method is the computer algorithm which obtains a character from a combination of key strokes. Notice that some languages may have several input methods.
There is more to this. Probably something in X configuration turns on the sticky keys.
Control of the keyboard layout is provided by the Keyboard Layout module of the KDE Control Center.
Control of the keyboard layout is provided by the Keyboard Layout module of the System Settings utility.
In addition to English, I type often in German and occasionally French For this, I find the layout “U.S. English” convenient.
Under Options set your preferred "Compose Key", which will be used to type non-English letters and characters. I use the "right-Alt" key for this purpose.
Setting the keyboard layout only allows for the possibility of X-windows to enter special symbols. This does not mean a given application can accept the symbols.
Some older applications such as Kterm have built-in input methods.
Applications linked with the KDE 3 or Gnome 2 libraries should be able to accept general Unicode typing. That is, you should be able to type any language, given you have the fonts and know how to work the input method for that language.
Recent versions of the text editors KWrite and Gedit are both Unicode enabled (but in different ways: Gedit assumes everything is Unicode; KWrite will open and save files in a variety of encodings). The mail client Balsa is also Unicode enabled.
There is a project under way to make Unicode xterms, but most xterm clones are still very much 8-bit devices. So long as you are willing to type only in certain combinations of languages with alphabetic writing system, this is not a big problem. You have to set the encoding to the one you want, then find a font that supports the encoding.
On my keyboard, the “right-Alt” key is the compose key. To produce a character, hold down the compose with the key in the meta column below then type the key in the combo column below to get the result.
The meta key must be pressed shortly after pressing the meta key—otherwise it will time out.
When shift is required with meta, hold it down before holding down meta.
Some other useful commands in this connexion: dump keys, show key.
This list is not exhaustive of key combinations that produce 8859-1 characters. I’ve only listed the ones that seemed most natural to me. On the other hand, dumpkeys lists many characters that can’t be made by any combination listed. In fact, most of the combinations dumpkeys lists don’t work.
|<||<||«||chevron or guillemet|
|>||>||»||chevron or guillemet|
|?||?||¿||inverted question mark|
|!||!||¡||inverted exclamation point|
|-||^||¯||overbar or macron|
|-||+||±||plus or minus|
|P||!||¶||paragraph or pilcrow|
|"||aeiouyAEIOU||äëïöüÿÄËÏÖÜ||umlaut or dieresis|
|/||oO||øØ||O with stroke|
|/||u||µ||micro or mu|
|a||a||å||a with ring|
|A||A||Å||capital A with ring|
|A||E||Æ||capital ligature AE|
|'||'||Ž||acute accent (8859-1), capital Z with caron (8859-15)|
|1||4||¼||fraction 1/4 (8859-1)|
|O||E||Œ||ligature OE (8859-15)|
|3||4||¾||fraction 3/4 (8859-1)|
|"||Y||Ÿ||capital Y with diaresis (8859-15)|
|=||C||€||currency (8859-1) Euro (8859-15)|
|o||c||©||or 0 c, copyright|