ISO Character Set

HTML Character Sets - Part 3

Forward: In this part of my series, I give you an overview of the ISO character set.

By: Chrysanthus Date Published: 31 Jul 2012

Introduction

This is part 3 of my series, HTML Character Sets. In this part of my series, I give you an overview of the ISO character set.

Note: If you cannot see the code or if you think anything is missing (broken link, image absent, etc.), just contact me at forchatrans@yahoo.com. That is, contact me for the slightest problem you have about what you are reading.

Description
ISO stands for International Standard Organization. The ASCII character set is too small for international use. So the ISO character set was developed. The ISO character set is so large that it exists in parts. You have the parts, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10, ISO-8859-15, ISO-2022-JP, ISO-2022-JP-2, ISO-2022-KR.

The default ISO character set part is ISO-8859-1. So, the HTML document will assume, ISO-8859-1, if you do not type any character set in a meta tag. Read the following table, which gives the description of the parts:

ISO Character Set Parts
Character set	Description	Covers
ISO-8859-1	Latin alphabet part 1	North America, Western Europe, Latin America, the Caribbean, Canada, Africa
ISO-8859-2	Latin alphabet part 2	Eastern Europe
ISO-8859-3	Latin alphabet part 3	SE Europe, Esperanto, miscellaneous others
ISO-8859-4	Latin alphabet part 4	Scandinavia/Baltics (and others not in ISO-8859-1)
ISO-8859-5	Latin/Cyrillic part 5	The languages that are using a Cyrillic alphabet such as Bulgarian, Belarusian, Russian and Macedonian
ISO-8859-6	Latin/Arabic part 6	The languages that are using the Arabic alphabet
ISO-8859-7	Latin/Greek part 7	The modern Greek language as well as mathematical symbols derived from the Greek
ISO-8859-8	Latin/Hebrew part 8	The languages that are using the Hebrew alphabet
ISO-8859-9	Latin 5 part 9	The Turkish language. Same as ISO-8859-1 except Turkish characters replace Icelandic ones
ISO-8859-10	Latin 6 Lappish, Nordic, Eskimo	The Nordic languages
ISO-8859-15	Latin 9 (aka Latin 0)	Similar to ISO 8859-1 but replaces some less common symbols with the euro sign and some other missing characters
ISO-2022-JP	Latin/Japanese part 1	The Japanese language
ISO-2022-JP-2	Latin/Japanese part 2	The Japanese language
ISO-2022-KR	Latin/Korean part 1	The Korean language

The Unicode Consortium
There is a consortium called the Unicode Consortium. The Unicode Consortium develops the Unicode Standard. Their goal is to replace the existing character-sets (parts) with its standard Unicode Transformation Format (UTF). Unicode can be implemented by several character-sets. The most commonly used encodings are UTF-8 and UTF-16. These are not different parts; they are alternatives.

A Unicode character set is better than the ISO character set in the sense that it encompasses many parts of the ISO character set.

That is it for this part of the series. We stop here and continue in the next part.

Chrys

Broad Network

Related Articles

ISO Character Set

HTML Character Sets - Part 3

Introduction

ISO Character Set Parts

Related Links

Comments