Any sequence of characters that do not constitute markup (see 9.6 "Delimiter Recognition" of [SGML]) are mapped directly to strings of data characters. Some markup also maps to data character strings. Numeric character references map to single-character strings, via the document character set. Each reference to one of the general entities defined in the HTML DTD maps to a single-character string.
For example,
abc<def => "abc","<","def" abc<def => "abc","<","def"
The terminating semicolon on entity or numeric character references is only necessary when the character following the reference would otherwise be recognized as part of the name (see 9.4.5 "Reference End" in [SGML]).
abc < def => "abc ","<"," def" abc < def => "abc ","<"," def"
An ampersand is only recognized as markup when it is followed by a letter or a `#' and a digit:
abc & lt def => "abc & lt def" abc &# 60 def => "abc &# 60 def"
A useful technique for translating plain text to HTML is to replace each '<', '&', and '>' by an entity reference or numeric character reference as follows:
ENTITY NUMERIC CHARACTER REFERENCE CHAR REF CHARACTER DESCRIPTION --------- ---------- ----------- --------------------- & & & Ampersand < < < Less than > > > Greater than