How Iso Cluster works

ArcGIS Pro 3.4 | | Help archive

Available with Spatial Analyst license.

The Iso Cluster tool uses a modified iterative optimization clustering procedure, also known as the migrating means technique. The algorithm separates all cells into the user-specified number of distinct unimodal groups in the multidimensional space of the input bands. This tool is most often used in preparation for unsupervised classification.

The iso prefix of the isodata clustering algorithm is an abbreviation for the iterative self-organizing way of performing clustering. This type of clustering uses a process in which, during each iteration, all samples are assigned to existing cluster centers and new means are recalculated for every class. The optimal number of classes to specify is usually unknown. Therefore, it is advised to enter a conservatively high number, analyze the resulting clusters, and rerun the function with a reduced number of classes.

The iso cluster algorithm is an iterative process for computing the minimum Euclidean distance when assigning each candidate cell to a cluster. The process starts with arbitrary means being assigned by the software, one for each cluster (you dictate the number of clusters). Every cell is assigned to the closest of these means (all in the multidimensional attribute space). New means are recalculated for each cluster based on the attribute distances of the cells that belong to the cluster after the first iteration. The process is repeated: each cell is assigned to the closest mean in multidimensional attribute space, and new means are calculated for each cluster based on the membership of cells from the iteration. You can specify the number of iterations of the process through Number of iterations. This value should be large enough to ensure that, after running the specified number of iterations, the migration of cells from one cluster to another is minimal; therefore, all the clusters become stable. When increasing the number of clusters, the number of iterations should also increase.

The specified Number of classes value is the maximum number of clusters that can result from the clustering process. However, the number of clusters in the output signature file may not be the same as the number specified for the number of classes. This situation occurs in the following cases:

  • The values of data and the initial cluster means are not evenly distributed. In certain ranges of cell values, the frequency of occurrences for these clusters may be next to none. Consequently, some of the originally predefined cluster means may not have a chance to absorb enough cell members.
  • Clusters consisting of fewer cells than the specified Minimum class size value will be eliminated at the end of the iterations.
  • Clusters merge with neighboring clusters when the statistical values are similar after the clusters become stable. Some clusters may be so close to each other and have such similar statistics that keeping them apart would be an unnecessary division of the data.

Example

The following is a sample signature file created by Iso Cluster. The file begins with a header, which is commented out, showing the values of the parameters used in performing the iso clustering.

The class names are optional and are entered after creating the file using a text editor. Each class name, if entered, must be a single string of characters no more than 14 alphanumeric characters in length.

# Signatures Produced by Clustering of 
#    Stack redlands
#    number_of_classes=6   max_iterations=20   min_class_size=20
#    sampling interval=10
#    Number of selected grids
/*           3
#    Layer-Number   Grid-name
/*           1      redlands1
/*           2      redlands2
/*           3      redlands3

# Type  Number of Classes   Number of Layers  Number of Parametric
                                                   Layers
   1             4                 3                 3
# ===============================================================

# Class ID     Number of Cells      Class Name
       1              1843 
# Layers   1             2             3
# Means 
        22.8817       60.7656       34.8893
# Covariance
1      169.3975      -69.7444      179.0808
2      -69.7444      714.7072       10.7889
3      179.0808       10.7889      284.0931
# ---------------------------------------------------------------

# Class ID     Number of Cells      Class Name
       2              2495 
# Layers   1             2             3
# Means 
         38.4894      132.9775       61.8104
# Covariance
1       414.9621      -19.0732      301.0267
2       -19.0732      510.8439      102.8931
3       301.0267      102.8931      376.5450
# ---------------------------------------------------------------
# Class ID     Number of Cells      Class Name
       3              2124 
# Layers   1             2             3
# Means 
         70.3983       82.9576       89.2472
# Covariance
1       264.2680      100.6966       39.3895
2       100.6966      523.9096       75.5573
3        39.3895       75.5573      279.7387
# ------------------------------------------------------------

# Class ID     Number of Cells      Class Name
       4              2438 
# Layers   1             2             3
# Means 105.8708      137.6645      130.0886
# Covariance
1       651.0465      175.1060      391.6028
2       175.1060      300.8853      143.2443
3       391.6028      143.2443      647.7345

References

Ball, G. H., and D. J. Hall. 1965. A Novel Method of Data Analysis and Pattern Classification. Menlo Park, California: Stanford Research Institute.

Richards, J. A. 1986. Remote Sensing Digital Image Analysis: An Introduction.. Berlin: Springer–Verlag.

Related topics


In this topic
  1. Example
  2. References