This explains how to type the characters from encodings for some language’s alphabets in Linux.
The data in a document can be thought of as a string of 0’s and 1’s: binary numbers. These numbers are always grouped into bytes which is a set of 8 bits. A byte can be interpreted as a number from 0 to 255.
A character encoding is a re-interpretation of these numbers as characters from human language alphabets.
For instance, the old ASCII encoding interprets numbers in the range 33-126 as characters that would be found on an American typewriter. (The remaining characters 0-33 and 127 are used as device control characters: carriage return, tab, newline, delete, etc.)
ASCII uses only the first half of the possible numbers in a byte, so it is called a “7-bit” encoding.
ASCII only barely suffices for American English, however. Many other encodings have been developed to encode other languages.
The first 127 characters of almost (but not all) encodings are ASCII. This is a good thing because it means that the US English part of any text can be understood (almost) independently of the encoding.The character encoding to end all character encodings is Unicode, whose goal is to encode the writing systems of all the world’s languages (as well as many other symbolic systems). Variants of Unicode are UTF-8, UTF-16, and ISO-10646. To do this, it is insufficient to associate the 255 different bytes with characters, so Unicode uses multiple bytes to encode characters. It is therefore called a multi-byte encoding.
Computer systems have mostly completed the move to Unicode. There are many weak spots, and older systems often don't support it at all.
Before Unicode, dozens of other encoding systems were developed. Most of them were 8-bit encodings, specialized for a particular language, or for some small group of languages. These include the international standard ISO-9959 series, which enable one to type in both English and some other language or languages. Then there were series of encodings meant for a particular computer architecture, such as the PC CodePage (CP) series for IBM PC compatibles and Microsoft Windows, and the Macintosh encodings.
An 8-bit encoding is insufficient for ideogrammatic writing systems such as Chinese, Japanese, and Korean. So multi-byte encoding systems were developed for these: Big-5, JIS, and KSC.
There is more to this. Probably something in X configuration turns on the sticky keys.
In KDE 3, one first must configure the keyboard with an appropriate layout. This is done with the Keyboard Layout module of the KDE Control Center. Under the Layout tab, click “Enable keyboard layouts”. This places a “KDE Keyboard Tool” in the KDE System Tray. To type primarily in U.S. English, you will need the layout called “U.S. English w/ ISO9995-3”. The layout “U.S. English with deadkeys” also works, but I find it annoying. To type in many other languages, click on the appropriate items. This will put that language on the list for the Keyboard Tool. The “Switching Policy” determines is where the KDE Keyboard Tool has its effect: Globally, only for the current application, or only for the current window.
Setting the keyboard layout only allows for the possibility of X-windows to enter special symbols. This does not mean a given application can accept the symbols.
Some older applications such as Kterm have built-in input methods.
Applications linked with the KDE 3 or Gnome 2 libraries should be able to accept general Unicode typing. That is, you should be able to type any language, given you have the fonts and know how to work the input method for that language.
Recent versions of the text editors KWrite and Gedit are both Unicode enabled (but in different ways: Gedit assumes everything is Unicode; KWrite will open and save files in a variety of encodings). The mail client Balsa is also Unicode enabled.
There is a project under way to make Unicode xterms, but most xterm clones are still very much 8-bit devices. So long as you are willing to type only in certain combinations of languages with alphabetic writing system, this is not a big problem. You have to set the encoding to the one you want, then find a font that supports the encoding.
See man iso-8859-1 and man iso-8859-15 for more info about these encodings.
Especially note: Latin-9 is a modification of Latin-1.
On my keyboard, the right “ windows” key is the meta key. To produce a character, hold down the meta with the key in the meta column below then type the key in the combo column below to get the result.
Time is of the essence between pressing the meta key and the first key of the combination.
When shift is required with meta, hold it down before holding down meta.
Some other useful commands in this connexion: dump keys, show key.
This list is not exhaustive of key combinations that produce 8859-1 characters. I’ve only listed the ones that seemed most natural to me. On the other hand, dumpkeys lists many characters that can’t be made by any combination listed. In fact, most of the combinations dumpkeys lists don’t work.
| meta | combo | results | description |
|---|---|---|---|
| Punctuation | |||
| < | < | « | chevron or guillemet |
| > | > | » | chevron or guillemet |
| ? | ? | ¿ | inverted question mark |
| ! | ! | ¡ | inverted exclamation point |
| - | ^ | ¯ | overbar or macron |
| 0 | ^ | ° | degree |
| - | - | | soft hyphen |
| non-breaking space | |||
| Superscripts | |||
| ^ | 1 | ¹ | |
| ^ | 2 | ² | |
| ^ | 3 | ³ | |
| Math | |||
| x | x | × | multiplication |
| - | : | ÷ | division |
| . | . | · | middle dot |
| - | + | ± | plus or minus |
| - | , | ¬ | negation |
| Editing | |||
| 0 | s | § | section |
| P | ! | ¶ | paragraph or pilcrow |
| Foreign | |||
| a | _ | ª | feminine ordinal |
| o | _ | º | masculine ordinal |
| ` | aAeEiIoOuU | àÀèÈìÌòÒùÙ | grave accents |
| ' | aeiouyAEIOU | áéíóúýÁÉÍÓÚ | acute accents |
| ^ | aAeEiIoOuU | âÂêÊîÎôÔûÛ | circumflex |
| " | aeiouyAEIOU | äëïöüÿÄËÏÖÜ | umlaut or dieresis |
| , | ,cC | žçÇ | z-caron; cedilla |
| ~ | nNaAoO | ñÑãÃõÕ | tilde |
| s | s | ß | sharp s |
| t | h | þ | small thorn |
| T | H | Þ | capital thorn |
| - | dD | ðÐ | eth |
| / | oO | øØ | O with stroke |
| / | u | µ | micro or mu |
| a | a | å | a with ring |
| A | A | Å | capital A with ring |
| a | e | æ | ligature ae |
| A | E | Æ | capital ligature AE |
| ^ | ! | Š | |
| ' | ' | Ž | acute accent (8859-1), capital Z with caron (8859-15) |
| 1 | 2 | ½ | fraction 1/2 |
| o | e | œ | ligature oe |
| 1 | 4 | ¼ | fraction 1/4 (8859-1) |
| O | E | Œ | ligature OE (8859-15) |
| 3 | 4 | ¾ | fraction 3/4 (8859-1) |
| " | Y | Ÿ | capital Y with diaresis (8859-15) |
| Business | |||
| L | = | £ | pounds |
| Y | = | ¥ | yen |
| c | / | ¢ | cents |
| = | C | € | currency (8859-1) Euro (8859-15) |
| o | c | © | or 0 c, copyright |
| o | r | ® | registered |