Encodings, web pages, and Linux

character encoding

A character encoding relates the bytes in a document to characters in a human language writing system. For historical reasons, there are many character encodings, mostly specialized for certain groups of languages.

Examples are: the venerable ASCII, which is specialized for American English, the ISO-8859 series, which aims to cover large groups of alphabetic languages (always including English), VISCII, which is specialized for Vietnamese, and Unicode, the big wopper of encodings, which aims to cover all the world’s writing systems, and other character systems too.

encodings for Web pages

encodings in Linux

encodings and fonts

In order to display a character from a writing system, a font must contain the corresponding glyph which is a graphical representation of the character.

Many fonts support part or most of the ISO-8859 series. A very few support most of Unicode. If you run
xfontsel
you can get an idea. Set “regstry” to “ISO10646” for Unicode, then look at the available “fndry”s and “fmly”s.

There are nice open-source efforts to produce Unicode fonts, especially DejaVu, based on Bitstream’s Vera family.

Microsoft used to distribute “MS Arial Unicode”, which was one of the best. Bitstream used to distribute “Cyberbit”.

Firefox’s Preferences lets you associate a font with an encoding. Several other programs give this kind of control.