The GNU Awk User's Guide - Passwd Functions

Go to the first, previous, next, last section, table of contents.

Reading the User Database

The `/dev/user' special file (see section Special File Names in gawk) provides access to the current user's real and effective user and group id numbers, and if available, the user's supplementary group set. However, since these are numbers, they do not provide very useful information to the average user. There needs to be some way to find the user information associated with the user and group numbers. This section presents a suite of functions for retrieving information from the user database. See section Reading the Group Database, for a similar suite that retrieves information from the group database.

The POSIX standard does not define the file where user information is kept. Instead, it provides the <pwd.h> header file and several C language subroutines for obtaining user information. The primary function is getpwent, for "get password entry." The "password" comes from the original user database file, `/etc/passwd', which kept user information, along with the encrypted passwords (hence the name).

While an awk program could simply read `/etc/passwd' directly (the format is well known), because of the way password files are handled on networked systems, this file may not contain complete information about the system's set of users.

To be sure of being able to produce a readable, complete version of the user database, it is necessary to write a small C program that calls getpwent. getpwent is defined to return a pointer to a struct passwd. Each time it is called, it returns the next entry in the database. When there are no more entries, it returns NULL, the null pointer. When this happens, the C program should call endpwent to close the database. Here is pwcat, a C program that "cats" the password database.

/*
 * pwcat.c
 *
 * Generate a printable version of the password database
 *
 * Arnold Robbins
 * arnold@gnu.ai.mit.edu
 * May 1993
 * Public Domain
 */

#include <stdio.h>
#include <pwd.h>

int
main(argc, argv)
int argc;
char **argv;
{
    struct passwd *p;

    while ((p = getpwent()) != NULL)
        printf("%s:%s:%d:%d:%s:%s:%s\n",
            p->pw_name, p->pw_passwd, p->pw_uid,
            p->pw_gid, p->pw_gecos, p->pw_dir, p->pw_shell);

    endpwent();
    exit(0);
}

If you don't understand C, don't worry about it. The output from pwcat is the user database, in the traditional `/etc/passwd' format of colon-separated fields. The fields are:

Login name: The user's login name.
Encrypted password: The user's encrypted password. This may not be available on some systems.
User-ID: The user's numeric user-id number.
Group-ID: The user's numeric group-id number.
Full name: The user's full name, and perhaps other information associated with the user.
Home directory: The user's login, or "home" directory (familiar to shell programmers as $HOME).
Login shell: The program that will be run when the user logs in. This is usually a shell, such as Bash (the Gnu Bourne-Again shell).

Here are a few lines representative of pwcat's output.

$ pwcat
-| root:3Ov02d5VaUPB6:0:1:Operator:/:/bin/sh
-| nobody:*:65534:65534::/:
-| daemon:*:1:1::/:
-| sys:*:2:2::/:/bin/csh
-| bin:*:3:3::/bin:
-| arnold:xyzzy:2076:10:Arnold Robbins:/home/arnold:/bin/sh
-| miriam:yxaay:112:10:Miriam Robbins:/home/miriam:/bin/sh
-| andy:abcca2:113:10:Andy Jacobs:/home/andy:/bin/sh
...

With that introduction, here is a group of functions for getting user information. There are several functions here, corresponding to the C functions of the same name.

# passwd.awk --- access password file information
# Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
# May 1993

BEGIN {
    # tailor this to suit your system
    _pw_awklib = "/usr/local/libexec/awk/"
}

function _pw_init(    oldfs, oldrs, olddol0, pwcat)
{
    if (_pw_inited)
        return
    oldfs = FS
    oldrs = RS
    olddol0 = $0
    FS = ":"
    RS = "\n"
    pwcat = _pw_awklib "pwcat"
    while ((pwcat | getline) > 0) {
        _pw_byname[$1] = $0
        _pw_byuid[$3] = $0
        _pw_bycount[++_pw_total] = $0
    }
    close(pwcat)
    _pw_count = 0
    _pw_inited = 1
    FS = oldfs
    RS = oldrs
    $0 = olddol0
}

The BEGIN rule sets a private variable to the directory where pwcat is stored. Since it is used to help out an awk library routine, we have chosen to put it in `/usr/local/libexec/awk'. You might want it to be in a different directory on your system.

The function _pw_init keeps three copies of the user information in three associative arrays. The arrays are indexed by user name (_pw_byname), by user-id number (_pw_byuid), and by order of occurrence (_pw_bycount).

The variable _pw_inited is used for efficiency; _pw_init only needs to be called once.

Since this function uses getline to read information from pwcat, it first saves the values of FS, RS, and $0. Doing so is necessary, since these functions could be called from anywhere within a user's program, and the user may have his or her own values for FS and RS.

The main part of the function uses a loop to read database lines, split the line into fields, and then store the line into each array as necessary. When the loop is done, _pw_init cleans up by closing the pipeline, setting _pw_inited to one, and restoring FS, RS, and $0. The use of _pw_count will be explained below.

function getpwnam(name)
{
    _pw_init()
    if (name in _pw_byname)
        return _pw_byname[name]
    return ""
}

The getpwnam function takes a user name as a string argument. If that user is in the database, it returns the appropriate line. Otherwise it returns the null string.

function getpwuid(uid)
{
    _pw_init()
    if (uid in _pw_byuid)
        return _pw_byuid[uid]
    return ""
}

Similarly, the getpwuid function takes a user-id number argument. If that user number is in the database, it returns the appropriate line. Otherwise it returns the null string.

function getpwent()
{
    _pw_init()
    if (_pw_count < _pw_total)
        return _pw_bycount[++_pw_count]
    return ""
}

The getpwent function simply steps through the database, one entry at a time. It uses _pw_count to track its current position in the _pw_bycount array.

function endpwent()
{
    _pw_count = 0
}

The endpwent function resets _pw_count to zero, so that subsequent calls to getpwent will start over again.

A conscious design decision in this suite is that each subroutine calls _pw_init to initialize the database arrays. The overhead of running a separate process to generate the user database, and the I/O to scan it, will only be incurred if the user's main program actually calls one of these functions. If this library file is loaded along with a user's program, but none of the routines are ever called, then there is no extra run-time overhead. (The alternative would be to move the body of _pw_init into a BEGIN rule, which would always run pwcat. This simplifies the code but runs an extra process that may never be needed.)

In turn, calling _pw_init is not too expensive, since the _pw_inited variable keeps the program from reading the data more than once. If you are worried about squeezing every last cycle out of your awk program, the check of _pw_inited could be moved out of _pw_init and duplicated in all the other functions. In practice, this is not necessary, since most awk programs are I/O bound, and it would clutter up the code.

The id program in section Printing Out User Information, uses these functions.

Go to the first, previous, next, last section, table of contents.