NAME

s.kcv - Randomly partition sites into test/train sets. (GRASS Sites Program)

SYNOPSIS

s.kcv
s.kcv help
s.kcv [-dq] k=value sites=name

DESCRIPTION

s.kcv randomly divides a sites lists into k sets of test/train data (for k-fold cross validation). Test partitions are mutually exclusive. That is, a site will appear in only one test partition and k-1 training partitions. The program generates a random point using the selected random number generator and then finds the closest site to it. This site is removed from the candidate list (meaning that it will not be selected for any other test set) and saved in the first test partition file. This is repeated until enough points have been selected for the test partition. The number of sites chosen for test partitions depends upon the number of sites available and the number of partitions chosen (this number is made as consistent as possible while ensuring that all sites will be chosen for testing). This process of filling up a test partition is done k times.

Flags:

-d Use drand48() (default is rand()).
-q Run quietly. Don't report progress.

Parameters:

k=value Positive integer value indicating the number of partitions.
sites=name Name of a sites file to store random points in.

Test/train pairs are saved as sites list using name as a basename. Test sites are saved in name-test.i while train­ ing sites are saved in name-train.i, where i ranges from zero to k.

NOTES

Existing files are silently overwritten.

An ideal random sites generator will follow a Poisson dis­ only be as random as the original sites. This program simply divides sites up in a random manner.

Be warned that random number generation occurs over the intervals defined by the current region.

This program may not work properly with Lat-long data.

SEE ALSO

s.rand and g.region

AUTHOR

James Darrell McCauley, Purdue University