Linux, Unix, /etc
Danger Will Robinson! You are now entering a condescending Unix user zone!
Sponsored links (requires javascript):

Using the m4 Macro Processor
Introduction
m4 is one of the unsung heroes of Linux and Unix. Unsung? Well, for
instance, in that great book Unix
Power Tools, not a single mention is made of it, though m4 has been a
standard part of Unix since V7. So, what is it about m4 that makes it
so useful, and yet so over-looked? m4 is a "macro processor": a dry name
that disguises a great facility. A macro-processor is basically a program
that scans text looking for defined symbols, which it replaces by other
text — or other symbols. Thus, it is a powerful general-purpose utility
that can be used to automate many tasks people often end up doing in sed,
awk, perl, or even their favourite text editor. Now, although this is so,
it is not so obvious that a "macro processor" is that big a deal. then,
Unix developers already have a built-in macro processor, in the form of
the C pre-processor, built into their compiler. Perhaps it is this that
accounts for m4's relative neglect. Whatever, this article hopes to show
Linux users the power and usefulness of this software tool.
What Is m4?
So, what is macro processing, and what it is good for? Kernighan
and Plauger, in their seminal work "Software Tools" have a succinct
definition:
"Macros are used to extend some underlying language — to perform a
translation from one language to another."
Thus, symbolic constants may be defined so that subsequent occurrences
of the name are replaced by the defining string of characters,
regardless of the contents of the definition or its context. such a
definition is called a macro, the replacement process is called
macro expansion, and the the program for doing it is called a macro
processor. So, the basic facility provided by any macro processor is
the replacement of text by other text. A macro is either defined by
the m4 program (a "built-in") or by the user. As well as doing macro
expansion, m4 has functions that include other files, do integer
arithmetic, manipulate text, and so forth. It is, that is to say,
a perfect example of the power of the Unix filter concept.
The contemporary implementation of m4 on a Linux system is GNU
m4, which follows System V Release 3 m4, with extensions. I am
aware of no other version of m4 that has been ported to Linux.
m4 implementations on BSD may differ slightly; but by and large, m4
is m4, and this article should be useful for other Unix users too.
The latest version is 1.4, which was released in October 1994.
Overview
The Scanning Process
As m4 reads its input, it separates it into "tokens". A token is
either a previously-defined name, a string, or any single character
that is not a part of either a name or a string. The input is then
scanned for recognised macros. This scanning process is recursive:
scanning continues until no more macros are recognised. The input
thus transformed is written to the output. Macros can be built-in,
or user-defined. A list of built-in macro follows later.
Defining Macros
The most important of the built-in macros is define(), which allows
the user to define his own macros. For example, define(author,
Paul Dunne) defines a macro "author", any occurrence of which will
be expanded to the string "Paul Dunne". m4 expands macro names into
their defining text as soon as it possibly can.
Quoting
The m4 quote characters are and . For example, this is quoted.
It is often best to quote both macro name and substitution text in
a definition. This avoids any unwanted side-effects, such as too
early expansion of another macro name. Also, since m4 uses commas
as argument separators, any definition with commas in must be quoted.
Arguments
As we have said, arguments to macros are delimited by commas. They are
also, as we've seen, enclosed in parentheses. A macro may also be
called with no arguments. This is common where we simply wish to
replace one string with another, as in the "author" example above.
Built-in Functions
m4 provides a small set of useful built-in functions. We may group
them under the following headings:
Flow Control Functions
m4 provides the classic "if-then" programming construct, in two
related forms.
ifdef(a,b)
defines b if a is defined, and
ifelse(a,b,c,d)
compares the strings a and b. If they match, string c is returned
as the function value; if not, string d. Actually, ifelse is not limited to
four arguments; it can take any greater number, and thus provides a limited
multi-way decision capability. For example,
ifelse(a,b,c,d,e,f,g)
means that
if a matches b, then c; else if d matches e, then f; else g.
Arithmetic Functions
There are three arithmetic built-ins.
incr
which increments its numeric argument by one.
dec
which decrements its numeric argument by one.
eval
which performs arbitrary integer arithmetic. Its operators are:
- unary + and -
- ** or ^ exponentiation
- + -
- == != < <= > >= equal, not equal, less than, less than or equal to,
greater than, greater than or equal to
- ! not
- & or && logical and
- | or || logical or
String Functions
len(a)
Returns the length of the string "a".
substr(s, m, n)
Returns a substring from the string "s", starting at position m,
and continuing for n characters.
As a more complicated example than those we've had so far, consider
this combination of ifelse, expr and substr.
define(len,`ifelse($1,,0,`eval(1+len(substr($1,2)))')')
Well now, what does that do? It is an implementation of the m4
built-in len in terms of other m4 built-ins! Note the two layers
of quotes. The outer layer prevents all initial evaluation. We want
"len" defined as exactly what's in the second argument. The inner
layer protects the eval builtin from being evaluated while the
arguments for the ifelse are collected.
translit(s, f, t)
Returns the string "s" with all occurrences of the character(s)
listed in "f" replaced by those listed in "t". It functions
as a simpler version of the Linux command tr. For example,
translit(s,abcdefghijklmnopqrstuvwxyz, nopqrstuvwxyzabcdefghijklm)
is the well-known rot13 or Caeser cipher.
File Functions
include(filename)
includes the contents of "filename" at the point in the input
stream at which it occurs. Useful if we have a central collection
of standard m4 macros, which we can then use in another file simply
with an appropriate include macro.
divert(n)
This is used to divert text from the input stream to an internal
file number. File number -1 is equivalent to discarding the text,
file number 0 is the normal output stream, and fields number 2 to 9
are usually used for temporary storage. For example,
divert(-1) is most commonly used to get rid of the extraneous white
space that is often generated by m4. For example,
divert(-1)
...
definitions
....
divert
ensures that no output is performed while the various definitions
between the ellipses are performed (the ellipses are not part of
m4 syntax!). Otherwise, we would end up with a pack of newlines in
our output.
dnl
Hard to categorise this one, so I've put it here. dnl is "delete
to newline". Used as a comment character in the original m4: as the
name suggests, all characters up to the next newline are deleted
from the output stream. GNU m4 also allows use of # as a comment
character, with the different that such comments *are* passed to
the output stream. Any macro calls or definitions after the # are
however ignored — the input is passed to the output exactly as is.
System Functions
esyscmd
Passes a command to the system interpreter (usually the unix shell)
for execution. For example, esyscmd(date) returns today's date.
There are also some miscellaneous functions that have been added to
the original m4 function set.
changecom
Used to change the m4 comment character (normally #).
traceon/off
Turn tracing on and off. This is useful for debugging.
Usage
A full summary of m4 usage is available through typing m4 —help.
This gives:
Usage: m4 [OPTION]... [FILE]...
Mandatory or optional arguments to long options are mandatory or optional
for short options too.
Operation modes:
—help display this help and exit
—version output version information and exit
-e, —interactive unbuffer output, ignore interrupts
-E, —fatal-warnings stop execution after first warning
-Q, —quiet, --silent suppress some warnings for builtins
-P, —prefix-builtins force a `m4_' prefix to all builtins
Preprocessor features:
-I, —include=DIRECTORY search this directory second for includes
-D, —define=NAME[=VALUE] enter NAME has having VALUE, or empty
-U, —undefine=NAME delete builtin NAME
-s, —synclines generate `#line NO "FILE"' lines
Limits control:
-G, —traditional suppress all GNU extensions
-H, —hashsize=PRIME set symbol lookup hash table size
-L, —nesting-limit=NUMBER change artificial nesting limit
Frozen state files:
-F, —freeze-state=FILE produce a frozen state on FILE at end
-R, —reload-state=FILE reload a frozen state from FILE at start
Debugging:
-d, —debug=[FLAGS] set debug level (no FLAGS implies `aeq')
-t, —trace=NAME trace NAME when it will be defined
-l, —arglength=NUM restrict macro tracing size
-o, —error-output=FILE redirect debug and trace output
FLAGS is any of:
t trace for all macro calls, not only traceon'ed
a show actual arguments
e show expansion
q quote values as necessary, with a or e flag
c show before collect, after collect and after call
x add a unique macro call id, useful with c flag
f say current input file name
l say current input line number
p show results of path searches
i show changes in input files
V shorthand for all of the above flags
If no FILE or if FILE is `-', standard input is read.
Well, that's a formidable list of options. But we need only use a few.
In fact, most often m4 is run as just 'm4', with perhaps the -P flag
to specify that built-ins are preceded by 'm4_', e.g. m4_include
rather than include. For example, here's a line I use in a makefile to
generate my html pages:
cat $*.m4 | htmlize | m4 -P > $*.html
In Use
Example: generating HTML
I use m4, among other Linux software tools, to maintain my web pages.
Rather than mark each page up in HTML, a tiresome chore, I have
written a set of definitions that translates m4 macros into HTML.
As well as being easier on the eye and easier to write than HTML,
this has other advantages. For example, an often-seen feature on
web sites is the navigational "button bar", which has links to the
main parts of a site. Obviously, it is nicer not to have a link
from the button bar to our Linux page if that's where we already are,
for example. This can be automated using m4, so that the right HTML
code is generated. The definition I use is this:
define(
_button_bar,
<HR>
<P ALIGN="center">
ifdef(_index,[Home],_link(index.html, [Home]))
ifdef(_linux,[Linux],_link(linux.html, [Linux]))
ifdef(_writing,[Writing],_link(writing.html, [Writing]))
ifdef(_bookshop,[Bookshop],_link(bookshop/index.html, [Bookstore]))
</P>
<HR>
)
Then, in the file linux.html, the macro is defined, and so
when _button_bar is referenced later in that file, the button bar
code generated has no link to the Linux page — the Linux link is
"grayed out".
Again, we can define our email address in the master file. Then,
if that changes, there is no need to do a global search-and-replace
through all the files that constitute the site. A simple "make"
updates everything — but that's the subject of another article.
Example: a Linux key-map
An interesting use of m4 is for the maintenance of Linux keymap files.
I don't do this myself, since hacking an existing file was simplest
for me, but it is a good example of the imaginative uses that m4 can
be put to. We don't have the space to examine the file in any depth
here; take a look at /usr/lib/kbd/keymaps/i386/qwerty/hypermap.m4 on
your Linux system.
Example: sendmail config
Perhaps the most well-known of m4 applications is it's use to
tame the fearsome complexity of sendmail configuration files.
The sendmail source distribution comes with m4 macros that are
sufficient to generate a sendmail.cf for most any site; at most, a
little tweaking of the resulting sendmail.cf file (whose syntax has
been memorably and justly compared to line noise) may be required.
For anyone who has tried to write a sendmail config file from scratch,
in the days before the m4 macros, this is a God-send.
Differences Between m4 Versions
Inevitably, there are different versions of m4. This is not an
issue for the Linux user, as they will invariably be using GNU m4.
The main difference is that SysV m4 supports multiple arguments to
defn. Since the usefulness of this is unclear to GNU m4's maintainer
(and indeed to me), this feature is not in GNU m4.
There are several other incompatibilities (which souldn't surprise
anyone who's tried to use GNU make and then BSD's pmake, or vice
versa). None are important, so those interested can read the relevant
info page (alas, no man page provided). Again, since this article is
about m4 rather than GNU m4, so I won't mention the various extensions
implemented in the GNU version.
Things to Watch
Quoting can be cantankerous on occasion. Quoting problems can
usually be solved by changequote. For example, if we want to
include one of the quote characters in a macro definition, we can
use changequote([[,]]) and then
define([[a quoted macro]])
will keep the quote characters in the macro definition — note that
and can't be escaped, so we have to do it this way.
Another thing to watch for is that the names of m4 builtins occurring
in your text will be taken for calls to the function, and expanding.
This can be avoided by quoting, but that is inconvenient. GNU m4
offers us a better way. The -P command-line switch allows us to
preface all builtins by the string m4_, rather as in the C preprocessor
the # character is used.
Limitations
Sadly, there is no man page. There is an info page, however.
m4 is a useful tool, but it can be overstrained. Although it can be
made to do most things with ingenuity, m4 is at its best when used
for straightforward text substitution, as with our HTML example.
Kernighan and Plauger sum it up nicely in Software Tools :
"The main thing is to ensure that any operation — macro call,
definition, other built-in — can occur in the middle of any other one.
If this is possible, then in principle the macro processor is capable
of doing any computation, although it may well be hard to express."
but
"In principle, macro [i.e. m4] is capable of performing any computing
task, but it is all too easy to write incomprehensible macros."
Resources
The manual for the GNU version of m4 is on-line at
http://www.gnu.org/manual/m4.
The classic book Software Tools devotes a chapter to the development
of a macro processor based on m4.
The original paper documenting the first Bell Labs m4 is available
at the Bell Labs site, as part of the V7 Unix doc. collection—
see http://plan9.bell-labs.com/7thEdMan/index.html.
Bob Hepple has an interesting page about using m4 to generate
HTML (in fact, I got the idea from his article) — see
http://www.bit.net.au/%7Ebhepple.
Paul Dunne 2001
[back to Linux, Unix, /etc]
Copyright © 1995-2007
Paul Dunne,
Sponsored links (requires javascript):