I've always found man pages fascinating. Formatted as strangely as they are and
accessible primarily through the terminal, they have always felt to me like
relics of an ancient past. Some man pages probably are ancient: I'd love to
know how many times the man page for cat
or say tee
has been revised since
the early days of Unix, but I'm willing to bet it's not many. Man pages are
mysterious—it's not obvious where they come from, where they live on your
computer, or what kind of file they might be stored in—and yet it's hard to
believe that something so fundamental and so obviously governed by rigid
conventions could remain so inscrutable. Where did the man page conventions
come from? Where are they codified? If I wanted to write my own man page, where
would I even begin?
The story of man
is inextricably tied to the story of Unix. The very first
version of Unix, completed in 1971 but only available internally at Bell Labs,
did not provide a man
command. But Douglas McIlroy, who at the time was head
of the Computing Techniques Research Department and managed the Unix project,
insisted that some kind of documentation be made available.1 He pushed Ken
Thompson and Dennis Ritchie, the two programmers commonly credited with
creating Unix, to write some. The result was the first
edition of the Unix
Programmer's Manual.
The first edition of the Unix Programmer's Manual consisted of (physical!)
pages collected together in a single binder. It documented only 61 different
commands, along with a couple dozen system calls and a few library routines.
Though the man
command itself was not to come until later, the first edition
of the Unix Programmer's Manual established many of the conventions that man
pages adhere to today, even in the absence of an official specification. The
documentation for each command included the well-known NAME, SYNOPSIS,
DESCRIPTION, and SEE ALSO headings. Optional flags were enclosed in square
brackets and meta-arguments (for example, "file" where a file path is expected)
were underlined. The manual also established the canonical manual sections such
as Section 1 for General Commands, Section 2 for System Calls, and so on; these
sections were, at the time, simply sections of a very long printed document.
Thompson and Ritchie could not have known that they were establishing a
tradition that would survive for decades and decades, but that is what they
did.
McIlroy later speculated about why the man page format has survived as long as it has. In a technical report about the conceptual development of Unix, he noted that the original man pages were written in a "terse, yet informal, prose style" that together with the alphabetical ordering of information "encouraged accurate on-line documentation."2 In a nod to an experience with man pages that all programmers have had at one time or another, he added that the man page format "was popular with initiates who needed to look up facts, albeit sometimes frustrating for beginners who didn't know what facts to look for." McIlroy was highlighting the sometimes-overlooked distinction between tutorial and reference; man pages may not be much use for the former, but they are perfect for the latter.
The man
command was a part of Unix by the time the second
edition
of the Unix Programmer's Manual was printed. It made the entire manual
available "on-line", meaning interactively, which was seen as enormously
useful. The man
command has its own manual page in the second edition (this
page is the original man man
), which explains that man
can be used to "run
off a particular section of this manual." Among the original Unix wizards, the
term "run off" referred to the physical act of printing a document but also to
the program they used to typeset documents, roff
. The roff
program had been
used to typeset both the first and second editions of the Unix Programmer's
Manual before they were printed, but it was now also used by man
to process
man pages before they were displayed. The man pages themselves were stored on
every Unix system in a file format meant to be read by roff
.
roff
was the first in a long lineage of typesetting programs that have always
been used to format man pages. Its own development can be traced back to a
program called RUNOFF
that was written in the mid-60s. At Bell Labs, roff
spawned several successors including nroff
(en-roff) and troff
(tee-roff). nroff
was designed to improve on roff
and better output text
to terminals, while troff
tackled the problem of printing using a CAT
phototypesetter. (If you don't know what phototypesetting is, as I did not, I
refer you to this eminently watchable film.) All
of these programs were based on a kind of markup language consisting of
two-letter commands inserted at the beginning of every line in a document.
These commands could control such things as font size, text positioning, line
spacing, and so on. Today, the most common implementation of the roff
system
is groff
, a part of the GNU project.
It's easy to get a sense of what roff
input files look like by just taking a
gander at some of the man pages stored on your own computer. At least on a
BSD-derived system like MacOS, you can use the --path
argument to man
to
find out where the man page for a particular command is stored. Typically this
will be under /usr/share/man
or /usr/local/share/man
. Using man
this way,
you can find the path for the man
man page itself and then open it in a text
editor. It will not look anything like what you're used to looking at with
man
. On my system, the first couple dozen lines are:
.TH man 1 "September 19, 2005"
.LO 1
.SH NAME
man \- format and display the on-line manual pages
.SH SYNOPSIS
.B man
.RB [ \-acdfFhkKtwW ]
.RB [ --path ]
.RB [ \-m
.IR system ]
.RB [ \-p
.IR string ]
.RB [ \-C
.IR config_file ]
.RB [ \-M
.IR pathlist ]
.RB [ \-P
.IR pager ]
.RB [ \-B
.IR browser ]
.RB [ \-H
.IR htmlpager ]
.RB [ \-S
.IR section_list ]
.RI [ section ]
.I "name ..."
.SH DESCRIPTION
.B man
formats and displays the on-line manual pages. If you specify
.IR section ,
.B man
only looks in that section of the manual.
.I name
is normally the name of the manual page, which is typically the name
of a command, function, or file.
However, if
.I name
contains a slash
.RB ( / )
then
.B man
interprets it as a file specification, so that you can do
.B "man ./foo.5"
or even
.B "man /cd/foo/bar.1.gz\fR.\fP"
.PP
See below for a description of where
.B man
looks for the manual page files.
You can make out, for example, that all of the section headings are preceded by
.SH
, and everything that would appear in bold is preceded by .B
. These
commands are roff
macros specifically designed for writing man pages. The
macros used here are part of a package called man
, but there are other
packages such as mdoc
that you can use for the same purpose. The macros make
writing man pages much simpler than it would otherwise be. They also enforce
consistency by always compiling down to the same set of lower-level roff
commands. The man
and mdoc
packages are now documented under
GROFF_MAN(7) and
GROFF_MDOC(7)
respectively.
The entire roff
system is reminiscent of LaTeX, another typesetting tool that
today enjoys much more popularity. LaTeX is essentially a big bucket of macros
built on top of the core TeX system designed by Donald Knuth. Like with roff
,
there are many other macro packages that you can incorporate into your LaTeX
documents. These macro packages mean that you almost never have to write raw
TeX yourself. LaTeX has superseded roff
in many domains, but it is poorly
suited to formatting text for a terminal, so nobody uses it to write man pages.
If you were to write a man page today, in 2017, how would you go about it? You
certainly could write a man page using a roff
macro package like man
or
mdoc
. The syntax is unfamiliar and unwieldy. But the macros abstract away so
much of the complexity that you can write a reasonably complete man page
without learning very many commands. That said, there are now other options
worth considering.
Pandoc is a widely used software tool for converting
documents from one format to another. You can use Pandoc to convert Markdown
files into man
-macro-based man pages, meaning that you can now write your man
pages in something as straightforward as Markdown. Pandoc supports many more
Markdown constructs than most Markdown converters, giving you lots of ways to
format your man page. While this convenience comes at the cost of some control,
it's unlikely that you will ever need something that would warrant dropping
down to the roff
macro level. If you're curious about what these Markdown
files might look like, I've written a few of my
own
to document a tool I created for keeping notes on how to use different
command-line utilities. NPM's
documentation
is also written in Markdown and converted to a roff
man format later, though
they use a JavaScript package called
marked-man instead of Pandoc to do
the conversion.
So there are now plenty of ways to write man pages, giving you lots of freedom
to choose the tool you think best. That said, you'd better ensure that your man
page reads like every other man page that has ever been written. Even though
there is now so much flexibility in tooling, the man page conventions are as
strong as ever. And you might be tempted to skip writing a man page
altogether—after all, you probably have documentation on the web, or maybe you
just want to rely on the --help
flag—but you're
forgoing the patina of respectability a man page can provide. The man page is
an institution that doesn't seem likely to disappear or evolve soon, which is
fascinating, because there are so many ways in which we could do man pages
better. XML didn't come off well in my last post,
but it would be the perfect format here, and it would allow us to do something
like query man
about an option:
$ man grep -v
Selected lines are those not matching any of the specified patterns.
Imagine that! But it seems that we're all too used to man pages the way they are. In a field where rapid change is the norm, maybe some stability—particularly in a documentation system we all turn to in moments of ignorance and confusion—is a good thing.
originally posted at two bit history under CC BY-SA 4.0 by Sinclair Target