Specifications - XML and XSLT
Specifications - HTML and Internet
- HTML 4.01 W3C Recommendation; or ISO/IEC 15445:2000(E), the ISO version
- XHTML 1.0 W3C Recommendation
- URIs: Generic Syntax IETF RFC 2396, hyperlinked version
- HTTP/1.1 IETF RFC 2616, hyperlinked version
- Internet Message Bodies [MIME] IETF RFC 1521, hyperlinked version
- Registered Media Types aka MIME types or Content-Type values; current IANA list
- Language identifiers IETF RFC 1766, hyperlinked version
This RFC explains the syntax for making 'language tags' like en-US and fr-CA.
It was superseded by RFC 3066,
which was superseded by RFC 4646.
- Language codes from ISO 639(-1):1988 and ISO 639-2:1998
This list also includes changes made since the 1988 & 1998 publications.
- Country codes from ISO 3166-1, current list
The official list of country codes for making RFC 3066-compliant language tags.
- Additional IANA language tags referenced by RFC 3066
- Language codes from ISO 639:1988 (obsolete)
- Country codes from ISO 3166:1988 (obsolete)
A strict interpretation of RFC 1766 would restrict one to just the language and
country codes that were available in 1988. However, RFC 1766's author
clarified on the IETF-Languages list on 02 Aug 2000 that the intent was to
refer to ISO 639:1988 and its successors, not just these codes.
Specifications - Character Encoding
- Ironically, ISO/IEC 10646-1 (The Universal Character Set: the basis for many
standards including HTML and XML) is not freely available online.
However, you can look at the last working draft of ISO/IEC 10646-1 2nd Edition
(no code charts though!), and the final draft of ISO/IEC 10646-2.
Or you could do like most people and look at The Unicode Standard instead.
- The Unicode Standard 3.0 and 4.0 are available online (PDF)
- Unicode character charts just like in the Unicode book (PDF format)
- Unicode Character Encoding Model Unicode Technical Report #17
This document clarifies all the character encoding related terminology.
- ISO/IEC TR 15285:1998 Operational model for characters and glyphs (PDF format)
This is essentially the ISO equivalent to Unicode TR#17.
- Registered Character Sets for info interchange current ISO 2375 list by ECMA
This is a list of character sets ISO allows for general use.
- Registered Character Sets for the Internet current IANA list
This is the official list of character sets the IANA allows for Internet use.
These are the possible values of charset parameters on media types. Very important.
- ISO 8859, ISO 6937-2 and ISO 646-based charset summaries IETF RFC 1345,
hyperlinked version, based on IANA registered charset list
- Windows codepages Microsoft developers' reference
- UTF-16 ISO/IEC 10646-1 Amendment 1
- UTF-16 IETF RFC 2781, hyperlinked version
- UTF-8 ISO/IEC 10646-1 Amendment 2
- UTF-8 IETF RFC 2279, hyperlinked version
- UTF-7 IETF RFC 2152, hyperlinked version
- XML/SGML entity declarations for ISO/IEC 10646 based documents
This is commonly used in XML DTDs to declare HTML entities like nbsp,
plus some that HTML doesn't use.
- Unicode in XML and other Markup Languages W3C Note and UTR #20
The following character encoding standards contain useful code charts.
- ISO/IEC 8859-1:1998 PDF of the text of the standard
- ISO/IEC 8859-4:1998 PDF of the text of the standard
- ISO/IEC 8859-7:1999 PDF of the text of the standard
- ISO/IEC 8859-10:1998 PDF of the text of the standard
- ISO/IEC 8859-11:1999 PDF of the text of the standard
- ISO/IEC 8859-13:1998 PDF of the text of the standard
- ISO/IEC 8859-15:1998 PDF of the text of the standard
- ISO/IEC 8859-16:2000 PDF of the text of the standard
- ISO/IEC 8859-2, 3, 5, 6, 8, 9 [:1999] are not available online
- ISO/IEC 8859-12, 14 [:1998] are also not available online
- ISO IR-001: ISO 646 C0 controls PDF of codes that ISO 646 added to ANSI X.38 (ASCII)
This is mainly for historical interest; it documents the original semantics of codes 0-31.
Other Resources - Character Encoding
Tutorials - XML and XSLT
Other Resources - XML and XSLT