# Choosing a lag size

Available with Geostatistical Analyst license.

The selection of a lag size has important effects on the empirical semivariogram. For example, if the lag size is too large, short-range autocorrelation may be masked. If the lag size is too small, there may be many empty bins, and sample sizes within bins will be too small to get representative averages for bins.

When samples are located on a sampling grid, the grid spacing is usually a good indicator for lag size. However, if the data is acquired using an irregular or random sampling scheme, the selection of a suitable lag size is not so straightforward. A simple rule is to multiply the lag size by the number of lags, which should be about half the largest distance among all points. Also, if the range of the fitted semivariogram model is very small relative to the extent of the empirical semivariogram, you can decrease the lag size. Conversely, if the range of the fitted semivariogram model is large relative to the extent of the empirical semivariogram, you can increase the lag size.

Another approach to determining the lag size is to use the Average Nearest Neighbor tool to determine the average distance between points and their nearest neighbors. This provides a reasonably good lag size, as every lag will have at least a few pairs of points in it. The Average Nearest Neighbor tool is located in Spatial Statistics Tools, under Analyzing Patterns. Only the input feature class needs to be specified. The distance method is automatically set to Euclidean distance.

After the tool runs, it prints an Observed Mean Distance, and this value can be used as a lag size for semivariogram/covariance modeling. However, if the dataset contains clustered points or samples, it may be advisable to use a smaller value for the lag size to obtain a more accurate estimate of the nugget for the semivariogram/covariance model.

Learn more about fitting a model to the empirical semivariogram