Date: Wed, 09 May 2001 22:49:03 -0600 
From: "Roger S. Miller" <rgrmill@rt66.com>
To: Markus Neteler <neteler@geog.uni-hannover.de>
Subject: r.fill.dir

Markus,

I think I finally completed all the changes I was going to make on
r.fill.dir.
I haven't had time yet to set up CVS or get a CVS account.  The modified
code is in the attached zip file, along with the little shell script
that I've been using to compile it.

There's a short description in the last half of this letter about the
methods I used to handle CELL, FCELL and DCELL maps.

The program is now entirely in C and uses the gislib functions for all
input and output.  It handles CELL, FCELL and DCELL rasters and I think
it handles nulls correctly.

I completely rewrote all of the larger code segments and the program now
runs about 3 times faster than the original fortran version.  It also
uses less memory and uses only three temp files.

I've added a flag and an option to produce a third output file,
otherwise the command line is identical to the original.  The new output
file is a map of any areas that have not been filled.  The flag "f"
instructs the program to fill single-cell pits but otherwise to just
find the undrained areas and exit. With the "f" flag set the program
writes an elevation map with just single-cell pits filled, a direction
map with unresolved problems and a map of the undrained areas that were
found but not filled.

I included these options because I found while running test cases that
filling DEMs was often not the best way to solve a drainage problem. 
The new options let the user get a partially-fixed elevation map,
identify the remaining problems and fix the problems appropriately.

I used a couple USGS 1-degree DEMS to benchmark the new code against the
old one.  The results are substantially identical except for three
cases:

1) The original code didn't consider the two outer rows and columns
around the edge of the map.  The new code only neglects the outer-most
row and column.  As a result, the two codes gave results that sometimes
differed at the edge of the map. 

2) The new code was able to solve problems that the old code would not
solve.

3) The new code may require several repeated runs (using output from one
run as input to the next run) to completely solve the elevation map.  In
the only case where this happened the old code didn't solve the problem
at all.

As a final test for the new code I used a part of the gtopo30 dem that
covers the southwestern US.  The part I extracted covered New Mexico,
about half of Colorado, large parts of Kansas, Oklahoma and Texas, and
adjoining areas in Mexico.  The modified code found more than 10,000
areas to fill, and got them all filled in 5 runs.  I didn't even try
that problem with the original code.  I doubt seriously that it would
have run it at all.

I took an approach to writing the code to handle CELL, FCELL and DCELL
maps that you might find interesting.  I don't know all of the methods
that others use, so it may not be new.  The file "tinf.c" (for Type
INdependent Functions) contains functions for executing simple
operations.  There are three versions of each function, one for CELL,
one for FCELL and one for DCELL.  The file "tinf.h" contains the
prototypes for those functions, plus the declarations for a number of
function pointers.  The file "tinf.c" also contains a function named
"set_func_pointers" that assigns the correct function to the function
pointer.  The code to read, write or process a raster file is written
using the function pointers, not the original function.

For example, tinf.c contains three functions that calculate the
difference between two values.  These are diff_c (for CELL values)
diff_f (for FCELL values) and diff_d (for DCELL values).  All three
functions have the same prototypes except for their names.  tinf.h
contains the prototypes for those functions, plus the declaration for a
function pointer "diff".  When the program runs it checks to see the
type of the input raster map, then calls set_func_pointers to assign the
correct function to the pointer "diff"; if the input is a CELL map then
"diff" points to diff_c, if it's an FCELL map then "diff" points to
diff_f and if it's a DCELL map then "diff" points to diff_d.  The rest
of the code is written using the function pointer diff, rather than the
actual function.

This method really works out well for i/o.  Look at the simple method
used in main.c to read and write the input and output maps.  Other
operations are also simplified, though the difference isn't as
outstanding.

Instead of:
   if(type==CELL)cellvalue1-=cellvalue2;
   else if(type==FCELL)fcellvalue1-=fcellvalue2;
   else if(type==DCELL)dcellvalue1-=dcellvalue2;


one can write just the single line:
   diff((void *)&value1,(void *)&value2);

This works regardless of the type of value1 and value2, as long as they
are both the same type.

With this method the program only tests once for the type of the input
map, and that's done with a call to set_func_pointers.  The program
never passes the map type to a function and the program doesn't include
duplicate code to account for different map types.  All the
type-dependent coding is already done.

The disadvantage is that most of the parameters passed to the
type-independent functions and most of the values returned by them must
be void pointers, and that leads to some cumbersome expressions.  Also,
it generally isn't possible to use "=" in any expression.  Instead you
have to use "memcpy" or something similar to move values around. 

The tinf.c and tinf.h files contain functions to handle basic raster
i/o, simple math operations, variable initializations, testing for
greater than and less than and a few other things.  A few more general
functions might also be useful, and some applications will require
specialized functions.

This might make it considerably easier to write programs that have to
read, handle and write maps of variable types.


Roger