DESCRIPTION

r.fill.gaps - Fast gap filling and interpolation of dense raster data.

The r.fill.gaps module is capable of quickly filling small "no data" areas (gaps) in large raster maps. As a rule of thumb, there should be at least as many data cells as there are "no data" cells in the input raster map. Several interpolation modes are available. By default, all values of the input raster map will be replaced with locally interpolated values in the output raster map. With dense data and small gaps, this should result in only slight deviations from the original data and produce smooth output. Alternatively, setting the -p flag will preserve the original cell values.

Available gap filling modes:

spatially weighted mean (default)
simple mean
simple median
simple mode

The spatially weighted mean is equivalent to an Inverse Distance Weighting (IDW; see also r.surf.idw) interpolation. The simple mean is equivalent to a low-pass filter. Median and mode replacements can also be achieved using r.neighbors.

Note that r.fill.gaps is highly optimized for fast processing of large raster datasets with a small interpolation distance threshold (i.e. closing small gaps). As a trade-off for speed and a small memory foot print, some spatial accuracy is lost due to the rounding of the distance threshold to the closest approximation in input raster cells and the use of a matrix of precomputed weights at cell resolution (see further down for details). Note also that processing time will increase exponentially with higher distance settings. Cells outside the distance threshold will not be interpolated, preserving the edges of the data zones in the input data.

This module is not well suited for interpolating sparse input data and closing large gaps. For such purposes solutions more appropriate methods are available, such as v.surf.idw or v.surf.rst.

Applications where the properties of r.fill.gaps are advantageous include the processing of high resolution, close range scanning and remote sensing datasets. Such datasets commonly feature densely spaced measurements that have some gaps after rasterization, due to blind spots, instrument failures, and misalignments between the GIS' raster cell grids and the original measurement locations. In these cases, r.fill.gaps should typically be run using the "weighted mean" (default) method and with a small distance setting (the default is to use a search radius of just three cells).

The images below show a gradiometer dataset with gaps and its interpolated equivalent, produced using the spatially weighted mean operator (mode="wmean").

In addition, r.fill.gaps can be useful for raster map generalization. Often, this involves removing small clumps of categorized cells and then filling the resulting data gaps without "inventing" any new values. In such cases, the "mode" or "median" interpolators should be used.

USAGE

The most critical user-provided settings are the interpolation/gap filling method (mode=) and the maximum distance, up to which r.fill.gaps will scan for given data points when attempting to fill/interpolate a value at any given location (distance=). The distance can be expressed as a number of cells (default) or in the current location's units (if the -m flag is given). The latter are typically meters, but can be any other units of a planar coordinate system.

Note that proper handling of geodetic coordinates (lat/lon) and distances is currently not implemented. For lat/lon data, the distance should therefore be specified in cells and usage of r.fill.gaps should be restricted to small areas to avoid large inaccuracies that may arise from the different extents of cells along the latitudinal and longitudinal axes.

Distances specfied in map units will be approximated as accurately as the current region's cell resolution settings allow. The program will warn if the distance cannot be expressed as whole cells at the current region's resolution. In such case, the number of cells in the search window will be rounded up. Due to the rounding effect introduced by using cells as spatial units, the actual maximum distance considered by the interpolation may be up to half a cell diagonal larger than the one specified by the user.

The interpolator type "wmean" uses a pre-computed matrix of spatial weights To speed up computation. This matrix can be examined (printed to the screen) before running the interpolation, by setting the -w flag. In mode "wmean", the power= option has the usual meaning: higher values mean that cell values in the neighborhood lose their influence on the cell to be interpolated more rapidly with increasing distance from the latter. Another way of explaining this effect is to state that larger "power" settings result in more localized interpolation, smaller ones in more globalized interpolation. The default setting is power=2.0.

The interpolators "mean", "median" and "mode" are calculated from all cell values within the search radius. No spatial weighting is applied for these methods. The "mode" of the input data may not always be unique. In such case, the mode will be the smallest value with the highest frequency.

Often, input data will contain spurious extreme measurements (spikes, outliers, noise) caused by the limits of device sensitivity, hardware defects, environmental influences, etc. If the normal, valid range of input data is known beforehand, then the minimum= and maximum= options can be used to exclude those input cells that have values below or above that range, respectively. This will prevent the influence of spikes and outliers from spreading through the interpolation.

Unless the -p (replace) flag is given, data cells of the input map will be replaced with interpolated values instead of preserving them in the output. Besides the result of the interpolation/gap filling, a second output map can be specified via the uncertainty= option. The cell values in this map represent a simple measure of how much uncertainty was involved in interpolating each cell value of the primary output map, with "0" being the lowest and "1" being the theoretic highest uncertainty. Uncertainty is measured by summing up all cells in the neighborhood (defined by the search radius distance=) that contain a value in the input map, multiplied by their weights, and dividing the result by the sum of all weights in the neighborhood. For mode=wmean, this means that interpolated output cells that were computed from many nearby input cells have very low uncertainty and vice versa. For all other modes, all weights in the neighborhood are constant "1" and the uncertainty measure is a simple measure of how many input data cells were present in the search window.

NOTES

The key to getting good gap filling results is to understand the spatial weighting scheme used in mode "wmean". The weights are precomputed and assigned per cell within the search window centered on the location at which to interpolate/gap fill all cells within the user-specified distance.

The illustration below shows the precomputed weights matrix for a search distance of four cells from the center cell:


000.00 000.01 000.04 000.07 000.09 000.07 000.04 000.01 000.00 
000.01 000.06 000.13 000.19 000.22 000.19 000.13 000.06 000.01 
000.04 000.13 000.25 000.37 000.42 000.37 000.25 000.13 000.04 
000.07 000.19 000.37 000.56 000.68 000.56 000.37 000.19 000.07 
000.09 000.22 000.42 000.68 001.00 000.68 000.42 000.22 000.09 
000.07 000.19 000.37 000.56 000.68 000.56 000.37 000.19 000.07 
000.04 000.13 000.25 000.37 000.42 000.37 000.25 000.13 000.04 
000.01 000.06 000.13 000.19 000.22 000.19 000.13 000.06 000.01 
000.00 000.01 000.04 000.07 000.09 000.07 000.04 000.01 000.00

Note that the weights in such a small window drop rapidly for the default setting of power=2.

If the distance is given in map units (flag -m), then the search window can be modeled more accurately as a circle. The illustration below shows the precomputed weights for a distance in map units that is approximately equivalent to four cells from the center cell:


...... ...... ...... 000.00 000.00 000.00 ...... ...... ...... 
...... 000.00 000.02 000.06 000.09 000.06 000.02 000.00 ...... 
...... 000.02 000.11 000.22 000.28 000.22 000.11 000.02 ...... 
000.00 000.07 000.22 000.44 000.58 000.44 000.22 000.07 000.00 
000.00 000.09 000.28 000.58 001.00 000.58 000.28 000.09 000.00 
000.00 000.07 000.22 000.44 000.58 000.44 000.22 000.07 000.00 
...... 000.02 000.11 000.22 000.28 000.22 000.11 000.02 ...... 
...... 000.00 000.02 000.06 000.09 000.06 000.02 000.00 ...... 
...... ...... ...... 000.00 000.00 000.00 ...... ...... ......

When using a small search radius, cells= must also be set to a small value. Otherwise, there may not be enough cells with data within the search radius to support interpolation.

This module can handle cells with different X and Y resolutions. However, note that the weight matrix will be skewed in such cases, with higher weights occurring close to the center and along the axis with the higher resolution. This is because weights on the lower resolution axis are less accurately calculated. The skewing effect will be stronger if the difference between the X and Y axis resolution is greater and a larger "power" setting is chosen. This property of the weights matrix directly reflects the higher information density along the higher resolution axis.

Note on printing the weights matrix (using the -w flag): the matrix cannot be printed if it is too large.

The memory estimate provided by the -m flag is a lower limit on the amount of RAM that will be needed.

If the -s flag is set, floating point type output will be saved as a "single precision" raster map, saving ~50% disk space compared to the default "double precision" output.

EXAMPLES

Gap-fill a dataset using spatially weighted mean (IDW) and a maximum search radius of 3.0 map units; also produce uncertainty estimation map:

r.fill.gaps input=measurements output=result dist=3.0 -m mode=wmean uncertainty=uncert_map

Run a fast low-pass filter (replacement all cells with mean value of neighboring cells) on the input map:

r.fill.gaps input=measurements output=result dist=10 mode=mean

Fill data gaps in a categorized raster map; preserve existing data:

r.fill.gaps input=categories output=result dist=100 -m mode=mode -p

CAVEATS

The straight-line metric used for converting distances in map units to cell numbers is only adequate for planar coordinate systems. Using this module with lat/lon input data will likely give inaccurate results, especially when interpolating over large geographical areas.

If the distance is set to a relatively large value, processing time will quickly approach and eventually exceed that of point-based interpolation modules such as v.surf.rst.

AUTHOR

Benjamin Ducke