NAME

v.apply.census - Calculate/Import Demographics from Census STF1 Files
(GRASS Vector Program)

SYNOPSIS

v.apply.census
v.apply.census help
v.apply.census [-d] [-p] [-l] [-u]
v.apply.census input_stf1=name
[out=name] [ef=name] [formula=mapname=expression]
[name_field=name] [zone=value] [spheroid=name]

DESCRIPTION

This program reads a previously selected subset of STF1 or PL94-171 U.S. Census Bureau demographic records (see m.in.stf1.tpe and m.in.stf1.db3 ), and writes one of the following:

GRASS site list (coordinates and a value).
GRASS vector polygon attribute labels file (coordinates and a label value).
Text report to stdout in one of two formats.

Any arithmetic expression combining constants with any of the more than 900 hundred demographic (numeric) fields can be defined and will be evaluated (by the Unix bc -l utility) to compute the value for each input record.

The location coordinates for the polygon label point or the site location is obtained from certain columns (269-287) in the input records, making this program quite specific for the specified types of STF1 and PL94-171 input file records. Use with other types of input data is not advised.

OPTIONS

Flags:

-d
Output IDENTIFICATION SECTION to stdout (20 pages)
-p
Output STF1 MATRIX TABLE to stdout (30 pages)
-l
Output PL94-171 MATRIX TABLE to stdout (1 page)
-u
Output STF1 SUMLEV TABLE to stdout (4 pages)

Note:
Only the first flag given will be executed; the program exits after sending one table to stdout. Other parameters are ignored.

Parameters:

input_stf1=name
Path/name of STF1 or PL94 input file
out=name
Type of output:
site | atts | table:Lxxx | - (stdout)
default: -
att
for results to vector map attribute file
site
for results to site list
-
for results to stdout; this is the default
table
for results in table form to stdout with the column value(s) represented by the ':Lxxx:Lxxx' string preceeding the expression value instead of easting and northing. Lxxx is in same choice of representation as column designation in formula (see below).
ef=name
Path/name of text file with formula expression(s)
formula=map=expression
mapname=Computation with STF1A columns, constants and bc operators (see below). mapname of vector or site map to make is required in all cases (but ignored for '-' or 'table:' output modes).
name_field=name
field name for parsing stdin lines (-a to ignore)
default: -a
name is Keyword in stdin stream preceeding the number of a STF1 record to read from input_stf1 file. The number is a physical sequence number, not a record identification number. This parameter is appropriate only in special uses of this program. If '-a' is given, or this parameter is omitted, all records in STF.file will be processed.
zone=value
UTM zone number; default is location zone
options: 1-60
default: 10
spheroid=name
Spheroid for LL to UTM conversion; see m.gc.ll
default: clark66

NOTE: Only one of the "ef" or "formula" input fields may be used.

THE "formula" PARAMETER

The string after the '=' in this parameter almost always must be enclosed in quotes to protect it from Shell interpretation of characters such as * / ( ) and spaces (which may be used to increase readability, but are not necessary). This string is always of the form mapname=expression.

The mapname string can be any legal GRASS map name (up to 14 characters). The second '=' is required and may be preceeded and followed by a space.

In the expression, great flexibility is provided for the computation of interesting combinations of the data fields contained in the demographic records.

The usual operators used in the expression include: +, -, *, /, and %. The user should review the Unix manual entry for the bc calculator for the complete list of available functions and other operators.

Parentheses are used in the normal algebraic manner.

The operands in the expression may consist of any mixture of:

integer constants real constants functions allowed by bc numeric fields from the demographic records

Numeric fields from both the IDENTIFICATION and MATRIX SECTIONS of the demographic records may be used. Review the User and Technical Documentation for these demographic files; or use the -p option of this program to print the MATRIX SECTION document (the demographic data fields), or m.in.stf1.db3 to print a simplified list of IDENTIFICATION SECTION field names. When substituting values from the numeric fields into the expression, <SPACE> characters are replaced by zero. (Spaces, which are rare in the demographic data, are usually the result of missing values or restricted information).

The numeric fields from the demographic records may be designated in one of three ways in the expression.

  1. 'Lcccc', where 'L' is an upper case alphabetic letter which indicates the length of the numeric field in the data records: A=1, B=2, ... , I=9, J=10, O=15, etc; and 'cccc' is the starting column number for the data field of interest: 301 for 100% population count, 7221 for total number of four-room housing units, 58 for 101st Congressional District, etc. The proper specification of the Congressional District number would be 'B58' because it is a two-column field. 'I301' would indicate that the 100% poplation should be used in the calculation; it's a nine-column field.

  2. 'Pnna0nnn' or 'Hnna0nnn', where 'n' is a digit and 'a' is a digit or (rarely) upper case letter. These forms represent the MATRIX field naming schemes used in the CD-ROM "dBase 3" files. They can be used in processing STF1 records extracted from the CD-ROM or TAPE distribution media. All eight characters of the field name must be used. (Note: this form cannot be used in processing the PL94-171 records.)

  3. 'ID_NAME'. The STF1 and PL94-171 files use the same set of IDENTIFICATION SECTION field names (67 fields) for the "locational" information contained in the first 300 characters of each record. The field name, in all upper case letters, may be used if the numeric value of the field is needed in the expression. Two of the most useful fields are AREALAND and AREAWAT which allow the direct computation of population density values.

The bc calculator usually returns a value with a very large number of decimal places. Vector attributes are automatically truncated(!) to integers by the v.support program when the topology file is built. Site list descriptions are likewise truncated to integers by GRASS programs which use the descriptions as numeric values (e.g., sites to cell in s.menu ).

THE 'out=table:' PARAMETER

The default operation of v.apply.census when tabular reports are produced to stdout (when not making a sites list or vector attribute file) is to print the easting and northing coordinates and then the value resulting from the expression. Often the user wishes to have an identifier different from the coordinates. The construction out=table:field will replace the coordinates with the character string indicated by field, which may have any of the three forms used for numeric fields in the formula expression, see above. Note: a special case exists for the 66 character description field which begins in column 192; the entire field will be printed if designated by either of these two forms: out=table:ANPSADPI or out=table:Z192.

Complex tables may be produced by making multiple runs of v.apply.census, redirecting the tabular output to files, and using the Unix tools cut, paste, and colrm to combine the resulting files.

EXAMPLES

[These examples assume that STF1 records for the census tracts (SUMLEV=140) in a particular county (CNTY=037) have been extracted from the distribution source (with m.in.stf1.db3 or m.in.stf1.tpe and saved in a file named 037.140 in the current directory.]

  1. Create a site list where the label values are the percentage of females in each input record:

    v.apply.census in=037.140 out=site formula='pct.female = 100 * (I373 / I301)'

  2. Label an existing vector map (named tract.pop) of tract boundaries with the total population of each tract (run v.support and build topology after creating the attributes file):

    v.apply.census in=037.140 out=atts formula=tract.pop=P0010001 (or)
    v.apply.census in=037.140 out=atts formula=tract.pop=I301

  3. Produce a tabular report of the census tract numbers and the number of Hispanics per square kilometer:

    v.apply.census in=037.140 out=table:TRACTBNA formula="m=(P0080001/AREALAND)*1000"

  4. Produce a tabular report of the number of people per housing unit for each tract and the coordinates of the internal (label) point:

    v.apply.census in=037.140 formula="map=P0010001/H0010001"

FORMAT OF THE FORMULA TEXT FILE

Running this program with the ef=file command line parameter causes the named file to be read and each formula contained therein to be processed as if it were entered on the command line. The out= parameter may optionally be respecified in this file; each out= selection remains in effect until explicitly changed. The program exits after the last formula is processed.

The following is a sample formula file. Note the use of lines beginning with '#' as manditory formula separators and comment lines. Also note that expressions may be continued on successive lines; the lines are concatenated to a maximum of 500 characters for a single formula. Blank lines are ignored.

popden.sqkm = 1000 * P0010001 / (AREALAND+AREAWAT)
# that computes population density in people per sq. km.
#
# next do people per sq. mile as a vector attribute file
out=att
popden.sqmi = 2.59*1000 *
P0010001 / (AREALAND+
AREAWAT)
#
# next do total population as a vector attribute file
total.pop = I301
#
# output the county identification numbers as the descriptions
# in a site list
out=sites county = CNTY
#
# output the 66 char. description and FIPS Place Code as a table
out=table:ANPSADPI
map=PLACEFP
# optional ending comment line

BUGS/FEATURES

Computational errors in bc are not handled too gracefully: a warning is output and a zero result is used.

bc tends to output lots of decimal places. The user must clean this up for output sent to stdout.

The GRASS site list output format used includes the '#' before the label value to facilitate the production of raster maps with cell values representing the results of the formula execution.

If using the "name_field" and "ef" parameters, the formula file may contain only one formula.

SEE ALSO

m.in.stf1.tape is used as a preprocessor to select subsets of STF1 or PL-171 tape format records for input to this program.

m.in.stf1.db3 is used as a preprocessor to select subsets of STF1 records from the CD-ROM in "dBase 3" format. Unix utility programs such at grep or awk can also be used to select subsets of lines from the PL94-171 files, but not from the STF1 tape files due to their very long record lengths.

AUTHOR

Dr. James Hinthorne, GIS Laboratory, Central Washington University. July, 1992.