GIP: 1
Title: Change of the raster file storage layout in GRASS 5.1
Version: $Date$
Author: neteler@itc.it (Markus Neteler)
Status: Active
Type: Informational
Created: 20 Jan 2003

INTRODUCTION

  While the 5.1 vector architecture is sort of settled now, we may want to
  have a look at a raster specific issue as well: The raster file structure.

  The 5.0 raster file structure differs from the G3D and the 5.1 vector file
  structure. G3D and 5.1 vector store maps according to following scheme:

  /path/to/location/mapset/grid3/mapname/files
  such as
	grid3/mapname/cell
	grid3/mapname/cellhd
	grid3/mapname/range
	grid3/mapname/cats

  /path/to/location/mapset/vector/mapname/files
  such as
	vector/mapname/coor
	vector/mapname/head
	vector/mapname/sidx
	vector/mapname/topo

  while the 5.0 raster files are spreaded over various subdirectories and
  organized by name.

  The proposal is to change the 5.0 raster file structure for 5.1
  to a raster file organization similar to above structure by:
        maptype/mapname/files

  This offers following advantages:

  - clean and "intuitive" file organization, all files in one place
    which simplifies map transfer from one location to another (if needed
    without reprojection etc), e.g. when working on a cluster

  - simplified GIS/Rast library functions as files are stored in one place.
    Currently a rather complex mechanism is implemented to search the spreaded
    raster files belonging to one map, often in various mapsets.

  - implementing the change for 5.1 does have minimal (no?) conflicts to
    the new vector modifications. And a delay to 5.3 is not recommended as the
    users may not want to change their data structures again.

  - updating of an existing database to the new raster file storage scheme
    is simple as the maps are either linked into the new raster/ directory
    or moved or copied. In contrast to the new vector format no format
    changes are intended (just a new place for the files)

  - raster/vector/G3D maps with same names are still possible

  - during this change/cleanup the 'white space" issues could be fixed
    (especially for MS-Windows users) as relevant functions are touched

  Potential disadvantages:

  - at least some raster modules have to be modified which directly access
    file in the user's (current) mapset
    Comment: with exceptions such modules *should* use library functions to
             access files and should be cleaned anyway

  - handling of 'colr2/' directory (user applies color table to map which
    is stored in another mapset) [1] and 'reclassed_to' file handling must be
    modified
    Comment: at least the reclassed_to' file handling was discussed earlier to
             have some disadvantages in the current implementation and might
             be updated/modified anyway

DISCUSSION

Glynn Clements wrote:
http://grass.itc.it/pipermail/grass5/2003-January/004579.html

  The key issue is the programming interface. All access to files within the
  GRASS database should ultimately go through a few core functions, e.g.
  G__find_file(); in that situation, the actual directory layout should be
  irrelevant to anything other than those core functions. AFAICT, the lowest
  level function should probably look like:

	G__file_name(gisdbase, location, mapset, type, name, element);

  Any higher-level interfaces should ultimately go through here.

  The most obvious higher-level interface would be one which accepts a
  combined mapset/name; this would allow e.g. changing the syntax of
  qualified names from "map@mapset" to "mapset/map", or eliminating
  mapsets altogether. Certainly, the logic of handling qualified names
  should be in one place rather than dotted around the code.

  Closely related to this is the way that modules currently handle
  qualified names. At present, modules use G_find_file() to split a
  (possibly qualified) map name into separate mapset/name components,
  then pass the components separately.

  This should be changed, IMHO; a module should treat a map name as an
  abstract identifier, and shouldn't have to even know about mapsets
  (apart from the obvious exceptions, e.g. g.mapsets).

  The main requirement here is for specific functions to generate map
  names based upon an existing name, coupled with some context. At
  present, individual modules basically perform string manipulation
  operations (concatenation, parsing) upon the strings which represent
  maps and mapsets.

  To give some concrete examples:

  1. If a module requires several output maps, it may wish to allow the
     user to just specify a "base" name; e.g. d.rgb might want to allow the
     user to enter:

	d.rgb input=foo

     instead of (at present);

	d.rgb r=foo.r g=foo.g b=foo.b

     However, for a qualified map name, entering:

	d.rgb input=foo@bar

     would need to be treated as:

	d.rgb r=foo.r@bar g=foo.g@bar b=foo.g@bar

     and *not* as:

	d.rgb r=foo@bar.r g=foo@bar.g b=foo@bar.g

     It would need to be able to do this without hard-coding the
     "map@mapset" convention into the module itself.

  2. Similarly, if a module generates multiple output maps from a single
     input map, it may wish to (by default) derive the names of all of the
     output maps from the name of the input map. In this case, the output
     names would need to be unqualified even if the input name was
     qualified. So, e.g. r.slope.aspect might wish to treat:

	r.slope.aspect elevation=foo@bar

     as equivalent to:

	r.slope.aspect elevation=foo@bar slope=foo.sl aspect=foo.as

     Again, one would wish to avoid hard-coding the "map@mapset" convention
     into the module itself.

     One problem with the general concept of channeling file access through
     a few key functions is the issue of scripts. Typically, these end up
     re-implementing the libgis logic; moreover, each individual script
     ends up with its own clone of the code.

     Witness the effort involved in replacing references to $LOCATION with
     g.gisenv. A similar effort may be required to handle any changes to
     the layout below the level of th mapset directory.

     For this reason, we should also consider providing standard Bourne
     shell and/or Tcl equivalents of the libgis functionality. This could
     be a set of standard "include" scripts, which would be accessed by
     e.g.

	source "$GISBASE/scripts/library.sh"
     or:
	source $env(GISBASE)/scripts/library.tcl

     possibly in combination with some standard utilities (e.g. 
     g.file.name) which would "export" the core functions in a way that can
     be used with scripts (although, for Tcl, it may be preferable to
     provide either a customised tclsh or a loadable module).

     [deleted rest of the message for now, not 100% related]

Notes
 
  [1] A 'colr2/' directory related suggestion from Glynn Clements:
  > Rather than having a special-case mechanism which allows an alternate
  > colour table to be "overlaid" onto an existing map (possibly in a
  > different mapset), it would be preferable, IMHO, to create a
  > "recolour" map. This would work like a reclass map; the "recolour" map
  > would exist as an actual map as far as the user is concerned, but all
  > of the data (except for the colour table) would be taken from the base
  > map.
  > 
  > There would probably be other uses for such a mechanism (e.g. category
  > labels, horizontally or vertically rescaled maps etc).