Sampling design is a critical part of any study involving modeling and estimation based on data that is sampled from natural resources or other phenomena occurring in the landscape. Statistical considerations related to sampling are part of a larger scenario involving theoretical knowledge, previously detected behavior and patterns of the phenomenon, costs, accessibility to sample sites, politics, and so forth. Thus, the sampling design algorithm should be flexible enough to accommodate external considerations in the design.
Currently, ArcGIS offers the following methods to construct sampling designs:
- Simple random sampling—Sites are generated independently, using the Create Random Points tool. A similar outcome can be obtained using the Create Random Raster tool and a probability cutoff value (the ArcGIS Spatial Analyst extension version of the Create Random Raster tool uses a uniform random number, whereas the Data Management toolbox version of the Create Random Raster tool supports several distributions). The method is simple and flexible, but the outcome of one realization may include areas where samples are clustered and other areas that are devoid of samples.
- Stratified random sampling—The study area is split into strata, and random samples are generated within each stratum. Strata can be adjusted based on prior knowledge of the phenomenon (for example, concentric circles can be made larger as the distance from a point source emission increases), providing some spatial structure to the sample.
The following design methods can be relatively easily generated using simple scripts or models:
- Systematic random sampling—An initial sample site is selected at random, and all other sites are selected so that they are located according to a regular pattern (for example, on the vertices of equilateral triangles, squares, hexagons, and so forth). This method is simple and provides designs that are spatially well balanced (well distributed in space).
- Clustered random sampling—The location for a group of sites is selected at random, and sites within each group are then located relatively close to one another. This can be done by generating randomly placed centers using the Minimum Allowed Distance parameter of the Create Random Points tool and allocating additional samples within a specified distance from each center. This method is easy to implement in practice, as many samples are collected from nearby locations (unlike a simple random sample pattern, where sample sites may occur anywhere in the study area).
These methods do not easily account for variations in the probability of a site to be selected (other than splitting the study area into strata, which usually requires manual inspection of the study site and knowledge of the process under study). Also, not all of them guarantee that the sampling design will be spatially balanced (that is, that the design will sample the entire population, due to the inherent randomness of selecting a site to sample). Due to this, the Create Spatially Balanced Points tool is available in the Geostatistical Analyst toolbox. For an explanation of how this tool works and the publications it is based on, see How Create Spatially Balanced Points works.