The BEGIN and END rules are each executed exactly once, at
the beginning and end respectively of your awk program
(see section The BEGIN and END Special Patterns).
We (the gawk authors) once had a user who mistakenly thought that the
BEGIN rule was executed at the beginning of each data file and the
END rule was executed at the end of each data file. When informed
that this was not the case, the user requested that we add new special
patterns to gawk, named BEGIN_FILE and END_FILE, that
would have the desired behavior. He even supplied us the code to do so.
However, after a little thought, I came up with the following library program.
It arranges to call two user-supplied functions, beginfile and
endfile, at the beginning and end of each data file.
Besides solving the problem in only nine(!) lines of code, it does so
portably; this will work with any implementation of awk.
# transfile.awk
#
# Give the user a hook for filename transitions
#
# The user must supply functions beginfile() and endfile()
# that each take the name of the file being started or
# finished, respectively.
#
# Arnold Robbins, arnold@gnu.ai.mit.edu, January 1992
# Public Domain
FILENAME != _oldfilename \
{
if (_oldfilename != "")
endfile(_oldfilename)
_oldfilename = FILENAME
beginfile(FILENAME)
}
END { endfile(FILENAME) }
This file must be loaded before the user's "main" program, so that the rule it supplies will be executed first.
This rule relies on awk's FILENAME variable that
automatically changes for each new data file. The current file name is
saved in a private variable, _oldfilename. If FILENAME does
not equal _oldfilename, then a new data file is being processed, and
it is necessary to call endfile for the old file. Since
endfile should only be called if a file has been processed, the
program first checks to make sure that _oldfilename is not the null
string. The program then assigns the current file name to
_oldfilename, and calls beginfile for the file.
Since, like all awk variables, _oldfilename will be
initialized to the null string, this rule executes correctly even for the
first data file.
The program also supplies an END rule, to do the final processing for
the last file. Since this END rule comes before any END rules
supplied in the "main" program, endfile will be called first. Once
again the value of multiple BEGIN and END rules should be clear.
This version has same problem as the first version of nextfile
(see section Implementing nextfile as a Function).
If the same data file occurs twice in a row on command line, then
endfile and beginfile will not be executed at the end of the
first pass and at the beginning of the second pass.
This version solves the problem.
# ftrans.awk --- handle data file transitions
#
# user supplies beginfile() and endfile() functions
#
# Arnold Robbins, arnold@gnu.ai.mit.edu. November 1992
# Public Domain
FNR == 1 {
if (_filename_ != "")
endfile(_filename_)
_filename_ = FILENAME
beginfile(FILENAME)
}
END { endfile(_filename_) }
In section Counting Things, you will see how this library function can be used, and how it simplifies writing the main program.
Go to the first, previous, next, last section, table of contents.