Connected: An Internet Encyclopedia
3.2.1. Data Characters

Up: Connected: An Internet Encyclopedia
Up: Requests For Comments
Up: RFC 1866
Up: 3. HTML as an Application of SGML
Up: 3.2. HTML Lexical Syntax
Prev: 3.2. HTML Lexical Syntax
Next: 3.2.2. Tags

3.2.1. Data Characters

3.2.1. Data Characters

Any sequence of characters that do not constitute markup (see 9.6 "Delimiter Recognition" of [SGML]) are mapped directly to strings of data characters. Some markup also maps to data character strings. Numeric character references map to single-character strings, via the document character set. Each reference to one of the general entities defined in the HTML DTD maps to a single-character string.

For example,

    abc&lt;def    => "abc","<","def"
    abc&#60;def   => "abc","<","def"

The terminating semicolon on entity or numeric character references is only necessary when the character following the reference would otherwise be recognized as part of the name (see 9.4.5 "Reference End" in [SGML]).

    abc &lt def     => "abc ","<"," def"
    abc &#60 def    => "abc ","<"," def"

An ampersand is only recognized as markup when it is followed by a letter or a `#' and a digit:

    abc & lt def    => "abc & lt def"
    abc &# 60 def    => "abc &# 60 def"

A useful technique for translating plain text to HTML is to replace each '<', '&', and '>' by an entity reference or numeric character reference as follows:

                     ENTITY      NUMERIC
           CHARACTER REFERENCE   CHAR REF     CHARACTER DESCRIPTION
           --------- ----------  -----------  ---------------------
             &       &amp;       &#38;        Ampersand
             <       &lt;        &#60;        Less than
             >       &gt;        &#62;        Greater than


Next: 3.2.2. Tags

Connected: An Internet Encyclopedia
3.2.1. Data Characters