5 DBFAWK
Tom Russo edited this page 2026-01-12 22:10:25 -07:00

DBFAWK: How Xastir decides to render shapefiles

Given no information about a shapefile, Xastir will simply draw every line in that file in thin, black lines with no labels, or every point in a point shapefile as an "X" with no label. This is not particularly informative and definitely not nice to look at. However, Xastir provides a powerful tool for you to tell it how to render any shapefile you give it. This tool is called "dbfawk."

DBFAWK uses information ("attributes" or "fields") in the DBF file that comes with every shapefile and rules you write to assign rendering variables it will use when displaying that shapefile. That's the "DBF" part of its name.

The rules you write look a lot like those used by the classic Unix tool, "awk". That's the second part of the name, though dbfawk is in no way directly compatible with awk itself.

DBFAWK was written for Xastir by Alan Crosswell, N2YGK, and has been part of the Xastir code base since 2003.

DBFAWK was only documented a bit in the README.MAPS file, which then referred users to read some of the ".dbfawk" files that come with Xastir for rendering commonly available shapefiles. This document is an attempt to consolidate that information in one place.

Pre-rolled DBFAWK files

Xastir is provided with some ready-made dbfawk files to go with commonly available shapefiles (or files that were once commonly available and which some Xastir users still use). They are listed in the page on ESRI Shapefiles.

DBFAWK tutorial

In the early days of DBFAWK's introduction into Xastir, KM5VY dug through the code and mailing list information to figure out how to create dbfawk files for some shapefiles he needed displayed, and wrote up a tutorial documenting what he found. The result is maintained in the DBFAWK Tutorial. This is a good place to start for a relatively quick introduction into how you can craft your own dbfawk file for a shapefile that's come into your possession.

That document walks through the process of figuring out how to populate the "dbfinfo" and "dbffields" variables, what rendering variables exist, and what colors you can use for rendering.

This document will try not to duplicate too much of the information in that tutorial.

DBFAWK requirements

DBFAWK works by reading DBF files associated with shapefiles for display. As such, you must have shapelib installed per the installation instructions.

DBFAWK uses the "Perl Compatible Regular Expression" library (PCRE), and this must also be installed. The preferred version of PCRE is version 2, known in many package management systems as "pcre2" or just "pcre". There is an older version of PCRE that we continue to support, but it is oddly named in some package management systems as "PCRE3" even though it's older than PCRE2. Xastir works with either one, but it is better to use the new one if it's available.

How DBFAWK files are chosen

When Xastir attempts to render a shapefile, the first thing it looks for is if there is a dbfawk file in the same directory with the same base name (e.g. "myshape.shp" and "myshape.dbfawk" in the same directory). If so, that file is used to render that shapefile and only that shapefile.

If no such file is found, the "config" directory of Xastir's installation directory (typically /usr/local/share/xastir/config, but your mileage may vary) is searched for a dbfawk file whose "dbfinfo" variable matches the "signature" of the DBF file that goes with your shapefile. If one is found, it applies to all shapefiles with the same signature.

Structure of a DBFAWK file

The basic structure of a DBFAWK file is:

# comments start with the "#" character

# The BEGIN block must be present in every file
# This block of code is executed once per DBF file being read, and
# sets two important variables
#
# dbfinfo: defines *all* of the fields in the DBF file, in the order
#        they appear in the file.  This is the "signature" of the DBF file.
# dbffields: lists the fields we actually care about
#
# The semicolons at the end of the lines are mandatory, but there's a
# bug in the dbfawk parser that won't tell you if one is missing,
# it'll just fail to work properly.

BEGIN {
dbfinfo="...";
dbffields="...";
}

# The BEGIN_RECORD block is executed once for each record in the DBF file
# and sets default values for rendering variables
BEGIN_RECORD {
variable=value;
variable=value;
...
}

# Rules are of this form:
/regular expression/ {action; action; ... }

# These lines are almost never used, but the actions in END_RECORD
# would be executed after all processing of a record is complete
# END actions would be executed after all processing of a single field
# END {action;}
# END_RECORD {action; ...}

dbfawk rules

Each field from "dbffields" is tested against all the rules in the file.

Regular expressions

The thing between the "/" characters in the rules is a regular expression to match. The strings Xastir will be passing in are of the form "dbffield=value" for each field in your "dbffields" for each record in the file.

A common pattern you might see in a dbfawk file is this:

/^fclass=roadway$/

which will match when the record has a field named "fclass" whose value is "roadway".

Another common pattern you might see is

/^name=(.+)$/

This will match any record that has a "name" field with a value of one or more characters.

The "^" and "$" in these regular expressions signify "beginning of string" and "end of string", meaning we're being explicit about an exact match in the first example. They are not required. For example:

/^fclass=road/

would match any record whose "fclass" field starts with road.

Any regular expression recognized by PCRE is usable between the slashes. Consult the documentation for perl regular expressions for details. The dbfawk tutorial provides some examples, as do the many ".dbfawk" files in "config" directory of the Xastir source code.

Actions

If the value of the field matches the regular expression in between the "/", the actions between the braces are executed. Most actions are of the form "variable=value;". Two special actions exist:

  • "next;" will cause all processing of this field to terminate and tell dbfawk to go on to the next one. Use this when this rule should be the final determination of how to set variables for a field.
  • "skip;" causes all processing of this record to terminate. Use it in rules that should be the last ones that determine this record's rendering variables.

positional substitutions and matches

If the regular expression contains any subexpressions in parentheses, each part of the matching string that corresponds to those subexpressions can be extracted in the actions using positional variables "$1", "$2", etc.

For example:

/^FEATURE=(.+)$/ {name="$1"; next}

means that the entire contents of the "FEATURE" field of the record should be matched by "(.+)" ("any string of one or more characters"), and that matching part of the string should be used as name.

Or for a more elaborate case:

/^FEATURE=Interstate Highway (.+)$/ {name="I-$1"; next}

would match any record that had a name that starts with "Interstate Highway" and is followed by a space and some extra characters. To construct the name we'll display, we'll take only that last bit and call the road by "I-" followed by that bit.

dbfawk variables

The complete list of variables you can set from a dbfawk file are given in the tutorial.

There is a tool provided for testing out dbfawk files

The testdbfawk program is installed when you install Xastir. It is useful for checking how shapefiles will be processed.

testdbfawk takes the following command line syntax:

  testdbfawk --help
    Prints command line usage
  testdbfawk -D directory -d file.dbf
    Scans directory for dbfawk files with signatures matching file.dbf,
    runs the dbfawk program that matches if it's found.
  testdbfawk -f file.dbfawk -d file.dbf
    runs the dbfawk file "file.dbfawk" over the dbf file "file.dbf"y

testdbfawk is rigid about how it expects its command line arguments. The -d file.dbf specifying the shapefile to test must be last, the argument that specifies how to find the dbfawk file (-D or -f) must be first.

testdbfawk sends all of its output to the standard error stream, so if you try to redirect it through a pipe to a paging program, you have to redirect the error stream, too. For the bash shell, that's done via the 2>&1 shell option, as in:

testdbfawk [arguments] 2>&1 | less

DBFAWK hints and kinks

From our old README.MAPS file, from N2YGK himself:

You have to think like an awk programmer and realize that the order that rules are listed matters, that it's important to use "next" as soon as it makes sense so other rules aren't looked at unnecessarily and, to use "skip" when you want to fix bad dbf data. For example, my county's Tiger/Line maps have several coding errors where a segment of a main highway is incorrectly labeled as a local street. This rule overrides one of those incorrect records:

# This Furnace Dock Rd segment is really Rt 9!
/^TLID=139773160$/ {name="Briarcliff-Peekskill Pky"; display_level=8192; label_level=512; color=4; lanes=4; skip;}

TLID is the Tiger/Line ID which is the unique identifier for this segment.