Understanding multivariate classification

Available with Spatial Analyst license.

The goal of classification is to assign each cell in a study area to a class or category. Examples of a class or category include land-use type, locations preferred by bears, and avalanche potential.

There are two types of classification: supervised and unsupervised. In a supervised classification, you have a sampling of the features. For example, you know that there is a coniferous forest in the northwest region of your study area, so you identify it by enclosing it on the map with a polygon (or with multiple polygons). Another polygon is created to encompass a wheat field, another for urban buildings, and another for water. You continue this process until you have enough features to represent a class, and all classes in your data are identified. Each grouping of features is considered a class, and the polygon that encompasses the class is a training sample. Once you have identified your training samples, multivariate statistics are calculated on them to establish the relationships within and between the classes. The statistics are stored in a signature file.

In an unsupervised classification, you do not know what features are actually at any specified location, but you want to aggregate each of the locations into one of a specified number of groups or clusters. What determines to which class or cluster each location will be assigned is dependent on the multivariate statistics that are calculated on the input bands. Each cluster is statistically separate from the other clusters based on the values for each band of each cell within the clusters. The statistics establishing the cluster definition are stored in a signature file.

There are four steps in performing a classification:

  1. Create and analyze the input data.
  2. Produce signatures for class and cluster analysis.
  3. Evaluate and, if necessary, edit classes and clusters.
  4. Perform the classification.

There are two input types to the classification: the input raster bands to analyze, and the classes or clusters into which to fit the locations. The input raster bands used in the multivariate analysis need to influence or be an underlying cause in the categorization of the classification. That is, slope, snow depth, and solar radiation can be factors that influence avalanche potential, while soil type may have no effect.

A class corresponds to a meaningful grouping of locations. Examples of classes include forests, water bodies, fields, and residential areas. Classes derived from clusters include deer preference or erosion potential.

Each location is characterized by a set or vector of values, one value for each variable, or band entered in the analysis. Each location can be visualized as a point in a multidimensional attribute space whose axes correspond to the variables represented by each input band. A class or cluster is a grouping of points in this multidimensional attribute space. Two locations belong to the same class or cluster if their attributes (vector of band values) are similar. A multiband raster and individual single band rasters can be used as the input into a multivariate statistical analysis.

Locations corresponding to known classes may form clusters in attribute space if the classes can be separated, or distinguished, by the attribute values. Locations corresponding to natural clusters in attribute space can be interpreted as naturally occurring classes of strata.

Multivariate statistical analysis references

Campbell, James B. 1987. Introduction to Remote Sensing. The Guilford Press.

Jensen, John R. 1986. Introductory Digital Image Processing: A Remote Sensing Perspective. Prentice Hall.

Johnson, Richard A., and Dean W. Wichern. 1988. Applied Multivariate Statistical Analysis. Prentice Hall.

Mosteller, Frederick, and John W. Tukey. 1977. Data Analysis and Regression: A Second Course in Statistics. Addison–Wesley.

Richards, John A. 1986. Remote Sensing Digital Image Analysis: An Introduction. Springer-Verlag.

Related topics