GRASS 5.7 Vector Format and API

This document describes new format for GRASS vector files with multicategory and optional 3D support.

You can send any comments to Radim Blazek and David D. Gray

See also:

GRASS 5.7 vector architecture features

The new GRASS 5.7 vector architecture implementation covers the new 3D, multi-attribute, multi-layer vector features. Current available are:

Introduction

The new format is very similar to old GRASS 4.x (5.0/5.3) vector format.

GRASS vector maps are stored in an arc-node representation, consisting of curves called arcs. An arc is stored as a series of x,y,z coordinate pairs. The two endpoints of an arc are called nodes. Two consecutive x,y,z pairs define an arc segment. The user specifies the type on input to GRASS, GRASS doesn't decide. GRASS allows for the line definition which allows for the multiple type to co-exist in the same map. Centroid are assigned to area it is within/inside (geometrically). An area is identified by an x,y,z centroid point geometrically inside with a category number. This identifies the area. Such centroids are stored in the same binary 'coor' file with other primitives. Each element may have none, one or more categories (cats). More cats are distinguished by field number (field). Single and multi-category support on modules level is implemented. Z-coordinate is optional and both 2d and 3d files may be written.

Topology general characteristics

  1. geometry and attributes are stored separately (don't read both if it is not necessary (usually it is not))
  2. the format is topological (areas build from boundaries)

Directory structure

Directory structure and file names are a bit changed. All vector files for one vector map are stored in one directory:
$MAPSET/vector/vector_name/
This directory contains these files:


coor Format Specification

Head

NameTypeNumberDescription
Version_Major C1 
Version_Minor C1 
Earliest_MajorC1 
Earliest_MinorC1 
byte_orderC1little or big endian flag; files are written in machine native order but files in both little and big endian order may be read
with_zC12D or 3D flag
sizeL1coor file size
reservedC10not used

Body

The body consists of line records:
NameTypeNumberDescription
record headerI1
  • 0. bit : 1 - alive, 0 - dead line
  • 1. bit : 1 - categories, 0 - no categories
  • 2.-3. bit : type - one of: GV_POINT, GV_LINE, GV_BOUNDARY, GV_CENTROID
  • 4.-7. bit : reserved, not used
ncatsC1number of categories (written only if categories exist)
fieldSncatsCategory identifier, distinguishes between more categories append to one line (written only if categories exist)
catIncatscategory value (written only if categories exist)
ncoorI1written for GV_LINES and GV_BOUNDARIES only
xDncoor 
yDncoor 
zDncoorpresent if with_z in head is set to 1

Types used in coor file
TypeNameSize in Bytes
DDouble8
LLong 4
IInt 4
SShort 4
CChar 1

head file format

The file is unordered list of key/value entries. The key is a string separated from value by a colon and optional whitespace. Key words are:
ORGANIZATION
DIGIT DATE
DIGIT NAME
MAP NAME
MAP DATE
MAP SCALE
OTHER INFO
ZONE
MAP THRESH

topo file format

[docs missing]

Topology is written for native format, pseudo-topology is written for PostGRASS, SHAPE-link.

/* Vector types used in memory on run time - may change */
#define GV_POINT      0x01
#define GV_LINE       0x02
#define GV_BOUNDARY   0x04
#define GV_CENTROID   0x08
#define GV_FACE       0x10
#define GV_KERNEL     0x20
#define GV_AREA       0x40
Face and kernel are 3D equivalents of boundary and centroid, but there is no support (yet) for 3D topology (volumes). The only current use of face is possibility to display vertical planes in NVIZ.
/* Topology level details */
#define GV_BUILD_NONE  0
#define GV_BUILD_BASE  1
#define GV_BUILD_AREAS  2
#define GV_BUILD_ATTACH_ISLES 3  /* Attach islands to areas */
#define GV_BUILD_CENTROIDS 4 /* Assign centroids to areas */
#define GV_BUILD_ALL GV_BUILD_CENTROIDS
GV_BOUNDARY contains geometry and it is used to build areas. GV_LINE cannot form an area.

Topology Example 1:

A polygon may be formed by many boundaries (more primitives but connected). One boundary is shared by adjacent areas.
+--1--+--5--+
|     |     |
2  A  4  B  6
|     |     |
+--3--+--7--+

1,2,3,4,5,6,7 = 7 boundaries (primitives)
A,B = 2 areas

Topology Example 2:

This is handled correctly in GRASS: A can be filled, B filled differently.
+---------+
|    A    |
+-----+   |
|  B  |   |
+-----+   |
|         |
+---------+
In GRASS, whenever 'inner' ring touches the boundary of outside area, even in one point, it is no more 'inner' ring, it is simply another area. A, B above can never be exported from GRASS as polygon A with inner ring B because there are only 2 areas A and B and no island.

Topology Example 3:

v.in.ogr/v.clean can identify dangles and change the type from boundary to line (in TIGER data for example). Distinction between line and boundary isn't important only for dangles. Example:
+-----+-----+
|     .     |
|     .     |
+.....+.....+
|     .     |
|  x  .     |
+-----+-----+

----  road + boundary of one parcel => type boundary
....  road => type line
x     parcel centroid (identifies whole area)
Because lines are not used to build areas, we have only one area/centroid, instead of 4 which would be necessary in TIGER.

Library

For historical reasons, there are two libraries for vector:

diglib, dig_*(), DIGLIB, libdig.a, digit library, grass3.x, 4.x
and
Vlib, Vect_*(), VECTLIB_REAL, libvect.a, vector library, grass4.x

Vector library was introduced in grass4.0 to hide vector files' formats and structures. In GRASS 5.7 everything is accessed via Vect_* functions, for example:
    xx = Map.Att[Map.Area[area_num].att].x;
by new function
    Vect_get_area_centroid()
    Vect_get_centroid_coor()


New or Modified Constants, Structures, Functions

/* types used in memory on run time - may change */
#define GV_POINT      0x01
#define GV_LINE       0x02
#define GV_BOUNDARY   0x04
#define GV_CENTROID   0x08
#define GV_FACE       0x10
#define GV_KERNEL     0x20
#define GV_AREA       0x40
#define GV_VOLUME     0x80

#define GV_POINTS (GV_POINT | GV_CENTROID )
#define GV_LINES (GV_LINE | GV_BOUNDARY )

struct line_cats
  {
      int *field;	/* pointer to array of fields */
      int *cat;		/* pointer to array of categories */
      int n_cats;	/* number of vector categories attached to element */
      int alloc_cats;	/* allocated space */
  };

int Vect_open_new (struct Map_info *, char name *, int with_z);
long Vect_write_line (struct Map_info *, int type, struct line_pnts *, struct line_cats *);
int Vect_read_next_line (struct Map_info *, struct line_pnts *, struct line_cats *);
struct line_cats *Vect_new_cats_struct (void); 
int Vect_reset_cats (struct line_cats *);
int Vect_destroy_cats_struct (struct line_cats *); 
int Vect_cat_set (struct line_cats *, int, int);
int Vect_cat_get (struct line_cats *, int, int *);
int Vect_cat_del (struct line_cats *, int);
int Vect_reset_cats (struct line_cats *);

And many, many others ...

Attributes

dig_cats files are not used any more and vectors' attributes are stored in external database. Connection with database is done through drivers based on DBMI library (odbc, dbf, Postgres and MySQL drivers are available at this time). Records in table are linked to vector entities by field and category number. The field identifies table and the category identifies record. I.e. for unique combination map+mapset+field+category exists one unique combination driver+database+table+row.

For each pair map + field must be defined table, key column, database, driver. This definition must be written to $MAPSET/DB text file. Each row in DB file contains names separated by spaces in following order ([] - optional):

map[@mapset] field table [key [database [driver]]]

If key, database or driver are omited (on second and higher row only) last definition is used. Definition from DB file in other mapset may be overwritten by definition in current mapset if mapset is specified with map name.

Wild cards * and ? may be used in map and mapset names.

Variables $GISDBASE, $LOCATION, $MAPSET, $MAP, $FIELD may be used in table, key, database and driver names. Note that $MAPSET is not current mapset but mapset of the map the rule is defined for.

Note that features in GRASS vector may have attributes in different tables or may be without attributes. Boundaries forms areas but it may happen that some boundaries are not closed (such boundaries would not appear in polygon layer). Boundaries may have attributes. All types may be mixed in one vector.

Link to the table is permanent and it is stored in 'dbln' file in vector directory. Tables are considered to be a part of the vector and g.remove, for example, deletes linked tables of the vector. Attributes must be joined with geometry.

Examples:

Examples are written mostly for dbf driver where database is full path to directory with dbf files and table name is name of dbf file without .dbf extension.

* 1 tbl id $GISDBASE/$LOCATION/$MAPSET/vector/$MAP dbf
This definition says that entities with category of field 1 are linked to dbf tables with names tbl.dbf saved in vector directories of each map.

* 1 $MAP id $GISDBASE/$LOCATION/$MAPSET/dbf dbf
Similar as above but all dbf files are in one directory dbf/ in mapset and names of dbf files are $MAP.dbf

water* 1 rivers id /home/grass/dbf dbf
water* 2 lakes lakeid /home/guser/mydb
trans* 1 roads key basedb odbc
trans* 5 rails
These definitions defines more fields for one map i.e. in one map may be more features linked to more tables. Definition on first 2 rows are applied for example on maps water1, water2, ... so that more maps may share one table.

water@PERMANENT 1 myrivers id /home/guser/mydbf dbf
This definion overwrites definition saved in PERMANENT/DB and links map from PERMANENT mapset to user's table.

Modules should be written so that connections to database for each vector field are independent. It should be possible to read attributes of input map from one database and write to some other and even with some other driver (should not be such problem).

There are open questions however. For example how to distinguish when new table should be written and when not. For example definitions:
river 1 river id water odbc
river.backup* 1 NONE
could be used to say that tables should not be copied for backups of map river because table is stored in reliable RDBMS.


Other Informations

Bounding box information was moved from coor file to topo file.


Vector ascii Format Specification

Ascii format in new version contains categories, new type centroid and z-coordinates. Points and centroids are saved as one coordinates pair instead of two. File is saved in old dig_ascii directory but the name will be probably changed.

Head

The head of the file is similar as the head file of vector binary format but contains bounding box also. Key words are:
ORGANIZATION
DIGIT DATE
DIGIT NAME
MAP NAME
MAP DATE
MAP SCALE
OTHER INFO
ZONE
WEST EDGE
EAST EDGE
SOUTH EDGE
NORTH EDGE
MAP THRESH

Body

The body begins by row:
VERTI:
followed by records of lines:
TYPE NUMBER_OF_COORDINATES [NUMBER_OF_CATEGORIES]
 Y X [Z]
....
 Y X [Z]
[ FIELD CATEGORY]
....
[ FIELD CATEGORY]
Everything in [] is optional. TYPE may be:

P point (dot)
p dead point (dead dot)
L line
l dead line
B(A) boundary
b(a) dead boundary
C centroid
c dead centroid

Example of records:
P 1 1
 1234 3435
 1 354
L 3 1
 4132 4534
 1453 1454
 1453 4543
 1 355

Vector modules and options

See also grass5/documents/parameter_proposal.txt

Operation

Each module which modifies and writes data must read from input= and write to output= so that data may not be lost. For example v.spag works on map= at in grass5.0 but if program (system) crashes or treshold was specified incorrectly and vector was not backuped, data are lost. In this case map= option should be replaced by input= and output=

Topology is always build by default if coor file was modified.

Dimension is kept. Input 2D vector is written as 2D, 3D as 3D.

Options

-f overwrite existing files, default
-i ask user before overwriting existing files
-b do not build topo file; topo file is written by default
-q quiet
-v run verbosely
-t create new table, default ???
-u don't create new table ???
-z write 3D file (if input was 2D)

map= input vector for modules without output
input= input vector
output= output vector
type= type of elements: point,line,boundary,centroid,area
cat= category or category list (example: 1,5,9-13,35)
field= field number
where= condition of SQL statement for selection of records
col= column name (in external table)


Vector module upgrade status

Vector upgrade status

Vector module programming example

Vector module C programming example (slightly outdated, you better start from new modules)
Document updated 2004 by M. Neteler
$Date$