unroff-html - HTML 2.0 back-end for the programmable troff translator
SYNOPSIS
unroff
[
-fhtml
] [
-mpackage
] [
file | option...
]
OVERVIEW
When called with the
-fhtml
option,
unroff
loads the back-end for the Hypertext Markup Language (HTML) version 2.0.
Please read
unroff(1)
first for an overview of the Scheme-based, programmable troff translator
and for a description of the generic options that exist in
addition to
-f
and
-m.
For information about extending and programming
unroff
also refer to the
Unroff Programmer's Manual.
unroff
is usually invoked with an additional
-mpackage
option (such as
-ms
or
-man)
to load the translation rules for the troff macros and other elements
defined by the macro package that is used to typeset the document.
If no
-m
option is supplied, only the standard troff requests, special characters,
escape sequences, etc. are recognized and translated to HTML by
unroff
as described in this manual.
OPTIONS
The following HTML-specific options can be specified in the command
line after the generic options.
See
unroff(1)
for a general description of keyword/value options and their types
and for a list of options that are not specific to the target language.
- title (string)
-
The value to be used for the <title> element in HTML output files.
This option may be ignored by the code implementing a specific
macro set, e.g. when special rules are employed to derive the title
from the contents of the troff input files.
Whether or not this option is required also depends on the specific
-m
option used, but it may be omitted if no
-m
option is given.
- document (string)
-
The prefix used for the names of all output files.
May be ignored depending on the macro package that has been selected.
- mail-address (string)
-
The caller's mail address; may be used for ``mailto:'' URLs,
in particular for the ``href'' attribute of the <link>
element that is usually generated.
- tt-preformat (boolean)
-
If 1, font changes to a font that is mapped to the <tt> element
are honored inside non-filled text (as described below).
The default is 0, i.e. the font changes will be recorded, but no
corresponding HTML tags will be emitted for them.
- handle-eqn (string)
-
- handle-tbl (string)
-
- handle-pic (string)
-
These options specify how equations, tables, and pictures encountered
in the troff input are processed.
Possible values are ``copy'' to include the raw eqn, tbl, or
pic commands as pre-formatted text, ``text'' to run the
respective troff preprocessor (eqn, tbl, or pic) and include its output
as pre-formatted text, or ``gif'' to convert the preprocessor
output to a GIF image and include it in the HTML document as
an inline image.
The default is ``text'' for
handle-tbl,
``gif'' for the other options.
See DESCRIPTION below for more information.
- eqn (string)
-
- tbl (string)
-
- pic (string)
-
These options specify the programs to invoke as the eqn, tbl,
and pic preprocessors.
The defaults are site-dependent.
- troff-to-text (string)
-
- troff-to-gif (string)
-
The programs to invoke for converting the output of a troff preprocessor
to plain text or to a GIF image.
The default values are site-dependent.
See DESCRIPTION below for more information on these options.
FILES
If no
-m
option is supplied,
unroff
reads the specified input files and sends the HTML document to
standard output, unless the
document
option is given, in which case its value together
with the suffix ``.html'' is used as the name of an
output file.
If no input files are specified, input is taken from standard input.
The output is enclosed by the usual HTML boiler-plate (<html>, <head>,
and <body> elements), a <title> element with the specified title
(or the value of
document
if no title has been given, or a default title if both are omitted),
a <link> element with rev= and href= attributes if
mail-address
has been set, and any pending end tags are generated on end of input.
Note that this is the default action that is performed in the
rare case when no macro package name has been specified, i.e. when
processing ``bare'' troff input.
Somewhat different rules may apply when processing, for
example, a group of UNIX manual pages
(-man).
See
unroff(1)
for a list of Scheme files that are loaded on startup.
DESCRIPTION
OUTPUT TRANSLATIONS
The characters `<', `>', and `&' are replaced by the entities
`<', `>', and `&' on output.
In addition, the quote character is mapped to `"' where
appropriate.
New mappings can be added by means of the
defchar
Scheme primitive as explained in the Programmer's Manual.
COMMENTS
each troff comment is translated to a corresponding HTML tag
followed by a newline; empty comments are ignored.
Comments are also ignored when appearing inside a macro body.
ESCAPE SEQUENCES
The following is a list of troff escape sequences that are recognized
and the HTML output generated for them.
Any escape sequence that does not appear in the list
expands to the character after the escape character, and
a warning is printed in this case.
New definitions can be added and the predefined mappings can
be replaced by calling the
defescape
Scheme primitive in the user's initialization file, in a user-supplied
Scheme file, in a document, or on a site-wide basis by modifying
the file
scm/html/common.scm
in the installation directory.
\& nothing
\- -
\| nothing
\^ nothing
\\ \
\' '
\` `
\" rest of line as HTML comment tag
\% nothing
\{ conditional input begin
\} conditional input end
\* contents of string
\space space
\0 space
\c nothing; eats following newline
\e \
\s nothing
\u nothing, prints warning
\d nothing, prints warning
\v nothing, prints warning
\o its argument, prints warning
\z its argument, prints warning
\k sets specified register to zero
\h appropriate number of spaces for positive argument
\w length of argument in units
\l repeats specified character, or <hr>
\n contents of number register
\f see description of fonts below
SPECIAL CHARACTERS
The following special characters are mapped to their equivalent
ISO-Latin 1 entities:
\(12 \(14 \(34 \(*b \(*m \(+- \(:A
\(:O \(:U \(:a \(:o \(:u \(A: \(Cs
\(O: \(Po \(S1 \(S2 \(S3 \(U: \(Ye
\(a: \(bb \(cd \(co \(ct \(de \(di
\(es \(hy \(mu \(no \(o: \(r! \(r?
\(rg \(sc \(ss \(tm \(u:
Heuristics have to be used for the following special characters:
\(** *
\(-> ->
\(<- <-
\(<= <=
\(== ==
\(>= >=
\(Fi ffi
\(Fl ffl
\(aa '
\(ap ~
\(br |
\(bu + (prints a warning)
\(bv |
\(ci O
\(dd *** (prints a warning)
\(dg ** (prints a warning)
\(em --
\(en -
\(eq =
\(ff ff
\(fi fi
\(fl fl
\(fm '
\(ga `
\(lh <=
\(lq ``
\(mi -
\(or |
\(pl +
\(rh =>
\(rq ''
\(ru _
\(sl /
\(sq o (prints a warning)
\(ul _
\(~= ~
A warning is printed to standard error output for any special
character not mentioned in this section.
To add new definitions, and to customize existing ones, the
defspecial
Scheme primitive can be used.
NON-FILLED TEXT
The
.nf
and
.fi
troff requests generate pairs of <pre> and </pre> tags.
Nested requests are treated correctly, and currently
active character formatting elements such as <i> (resulting
from troff font changes) are temporarily disabled while
the <pre> or </pre> is emitted.
A warning is printed if a ``tab'' character is encountered
within filled text.
FONTS
The `\f' escape sequence and the requests
.ft
(change current font) and
.fp
(mount font at font position) are supported in the usual way,
both with numeric font positions as well as font names and
the special name `P' to denote the previous font.
The font position of the currently active font is available
through the read-only number register `.f'.
Initially, the font `R' is mounted on font positions 1 and 4,
font `I' on font position 2, and font `B' on position 3.
To map troff font names to HTML character formatting elements,
the define-font Scheme procedure is called with the name
of a troff font to be used in documents, and
HTML start and end tags to be emitted when changing to this font,
or when changing
from
this font to another font, respectively.
Whether <tt> and </tt> is generated inside non-filled (pre-formatted)
text for fixed-width fonts is controlled by the option
tt-preformat.
The following calls to
define-font
are evaluated on startup:
(define-font "R" "" "")
(define-font "I" '<i> '</i>)
(define-font "B" '<b> '</b>)
(define-font "C" '<tt> '</tt>)
(define-font "CW" '<tt> '</tt>)
(define-font "CO" '<i> '</i>) ; kludge for Courier-Oblique
Site administrators may add definitions here for fonts used
at their site.
Users can define mappings for new fonts by placing corresponding
definitions in their documents or document-specific Scheme files.
OTHER TROFF REQUESTS
The
.br
request generates a <br> tag.
.sp
requires a positive argument and is mapped to the appropriate number
of <p> tags (or newline characters inside non-filled/pre-formatted
text).
Likewise, the request
.ti,
when called with a positive indent, produces a <br> followed by the
appropriate number of non-breakable spaces.
The
.tl
requests justs emits the title parts delimited by spaces.
It is impossible to preserve the meaning of this request
in HTML 2.0.
The horizontal line drawing escape sequence `\l' just repeats
the specified character (or underline as default) to draw
a line.
If the given length looks like it could be the line length
(that is, if it exceeds a certain value), a <hr> tag
is produced instead.
Example:
\l'5c\&-'
\l'60'
The first of these two requests
would produce a line of 20 dashes, while the second
request would generate a <hr> tag (the '\&' is required
because the dash could be interpreted as a continuation of
the numeric expression).
Centering
(.ce)
is simulated by producing a <br> at the end of each line, as
this functionality is not supported by HTML 2.0.
The following requests are silently ignored; as the corresponding
functions cannot be expressed in HTML 2.0 or are controlled by
the client.
Ignoring these requests most likely does no harm.
.ad .bp .ch .fl .hw .hy .lg
.na .ne .nh .ns .pl .ps .rs
.vs .wh
All troff requests not mentioned in this section by default
cause a warning message to be printed to standard error output,
except for these basic requests which have their usual
semantics:
.am .as .de .ds .ec .el .ie
.if .ig .nr .rm .rr .so .tm
The
defrequest
Scheme primitive is used to associate an event handling procedure
with a request as documented in the Programmer's Manual.
END OF SENTENCE
The sequence ``<tt>space</tt>'' is produced at the end of
each sentence to provide additional space, except inside non-filled text.
A sentence is defined a sequence of characters followed by
a period, a question mark, or an exclamation mark, followed
by a newline.
The usual convention to suppress end-of-sentence recognition
by adding the escape sequence `\&' is correctly implemented by
unroff.
To change the end-of-sentence function, the
sentence-event
can be redefined from within Scheme code as described in
the Programmer's Manual.
SCALE INDICATORS
As the notions of vertical spacing, character width, device
resolution, etc. do not exist in HTML, the scaling for the
usual troff scale indicators is defined once on startup and
then remains constant.
For simplicity, the scaling usually employed by
nroff(1)
is taken.
EQUATIONS, TABLES, PICTURES
Interpretation of embedded eqn, tbl, and pic preprocessor input
is controlled by the options
handle-eqn,
handle-tbl,
and
handle-pic
(see OPTIONS above).
These options affect the input lines from a starting
.EQ,
.TS,
or
.PS
request up to and including the matching
.EN,
.TE,
or
.PE
request, as well as text surrounded by the current eqn
inline equation delimiters.
Each of the options can have one the following values:
- copy
-
The preprocessor input (including the enclosing requests) is
placed inside <pre> and </pre>.
If assigned to the option
handle-eqn,
inline equations are rendered in the font currently mounted
on font position 2.
- text
-
The input is sent to the respective preprocessor (as specified
by the options
eqn,
tbl,
or
pic),
and its result is piped to the shell command referred to by the
option
troff-to-text,
which typically involves a call to
nroff(1)
or an equivalent command.
As with ``copy'', the result is then placed inside
<pre> and </pre>, unless the source is an inline equation.
The value of
troff-to-text
is filtered through a call to the
substitute
Scheme primitive with the name of an output file as its argument;
this file name can be referenced from within the option's value
by the substitute specifier ``%1%'' (see the Programmer's
Manual for a description of
substitute
and a list of substitute specifiers).
Here is a typical value for the
troff-to-text
option:
-
"groff -Tascii | col -b | sed '/^[ \t]*$/d' > %1%"
- gif
-
Input lines are preprocessed as described under ``text'', and
the result is piped to the shell command named by the option
troff-to-gif.
The latter is subject to a call to
substitute
with the name of a temporary file (which may be used to store intermediate
PostScript output) and the name of the output file where the resulting
GIF image must be stored.
The entire preprocessor input is replaced by an <img> element with
a reference to the GIF file and a suitable ``alt='' attribute.
Unless processing an inline equation, the <img> element is
surrounded by <p> tags.
The names of the files containing the GIF images are generated
from the value of the
document
option, a sequence number, and the suffix ``.gif''.
Therefore, the
document
option must have been set when using the ``gif'' method,
otherwise a warning is printed and the preprocessor input
is skipped.
In any case, the output of a call to eqn is ignored if the
input consists of calls to ``delim'' or ``define''
and empty lines exclusively.
When processing eqn input, calls to ``delim'' are intercepted by
unroff
to record changes of the inline equation delimiters.
HYPERTEXT LINKS
The facilities for embedding arbitrary hypertext links in troff
documents are still experimental in this version of
unroff
and thus are likely to change in future releases.
To use them, mention the file name ``hyper.scm'' in the
command line before any troff source files.
At the beginning of the first troff file, source the file
``tmac.hyper'' from the directory ``doc'' like this:
.if !\n(.U .so tmac.hyper
The request
.Hr
can then be used to create a hypertext link.
Its usage is:
.Hr -url URL anchor-text [suffix]
.Hr -symbolic label anchor-text [suffix]
.Hr troff-text
The first two forms are recognized by
unroff
and the third form is recognized by troff.
The first form is used for links pointing to external resources,
and the second one is used for forward or backward links referencing
anchors defined in a file belonging to the same document.
An anchor is placed in the document by calling the request
.Ha:
.Ha label anchor-text
The label specified in a call to
.Ha
can then be used in calls to
.Hr -symbolic.
All symbolic references must have been resolved at the end of the document.
The ``anchor-text'' is placed between the tags <a> and </a>;
``suffix'' is appended to the closing </a> if present.
``troff-text'' is just formatted in the normal way.
Quotes must be used if any of the arguments contains spaces.
Use of the hypertext facilities is demonstrated by the troff source
of the Programmer's Manual that is included in the
unroff
distribution.
SCHEME PROCEDURES
The following Scheme procedures, macros, and variables are defined
by the HTML 2.0 back-end and can be used from within user-supplied
Scheme code:
- (define-font name start-tag end-tag)
-
Associates a HTML start tag and end tag (symbols) with a troff
font name (string) as explained under FONTS above.
The font name can then be used in
.fp,
.ft,
and `\f' requests.
- (reset-font)
-
Resets both the current and previous font to the font mounted
on position 1.
- current-font
-
- previous-font
-
These variables hold the current and previous font as
(integer) font positions.
- (with-font-preserved . body)
-
This macro can be used to temporarily change to font ``R'',
evaluate body, and revert to the font that has been
active when the form was entered.
The macro returns a string that can be output using the
primitive emit or returned from an event procedure.
- (preform enable?)
-
If the argument is #t, pre-formatted text is enabled, otherwise disabled.
- preform?
-
This boolean variable holds #t if pre-formatted text is enabled,
#f otherwise.
- (with-preform-preserved . body)
-
A macro that can be used to temporarily disable pre-formatted
text, evaluate body, and then re-enable it if appropriate.
The macro expands to a string that must be output or returned from
an event procedure.
- (parse-unquote string)
-
Temporarily establishes an output translation to map the quote
character to ``"'', applies parse (explained
in the Programmer's Manual) to its argument, and returns the result.
- (center n)
-
Centers the next n input lines (see description of
.ce
under TROFF REQUESTS above).
If n is zero, centering is stopped.
- nbsp
-
A Scheme variable that holds a string interpreted as a non-breaking
space by HTML clients.
SEE ALSO
unroff(1),
unroff-html-man(1),
unroff-html-ms(1);
troff(1),
nroff(1),
groff(1),
eqn(1),
tbl(1),
pic(1).
Unroff Programmer's Manual.
http://www.informatik.uni-bremen.de/~net/unroff
Berners-Lee, Connolly, et al.,
HyperText Markup Language Specification--2.0,
Internet Draft, Internet Engineering Task Force.
BUGS
The `\space' escape sequence should be mapped to the   entity
(non-breaking space), but this entity is not supported by a number
of HTML clients.
Only the font positions 1 to 9 can currently be used.
There should be no limit.
The extra space generated for end of sentence should be configurable.
Underlining should be supported.
Markup created by unroff 1.0, March 21, 1996.