Performing the classification

Available with Spatial Analyst license.

The goal of classification is to assign each cell in the study area to a known class (supervised classification) or to a cluster (unsupervised classification). In both cases, the input to classification is a signature file containing the multivariate statistics of each class or cluster. The result of each classification is a map that partitions the study area into known classes, which correspond to training samples, or naturally occurring classes, which correspond to clusters defined by clustering. Classifying locations into naturally occurring classes corresponding to clusters is also referred to as stratification.

Maximum likelihood

Cells in a class are rarely homogeneous. This is especially true with training samples taken for a supervised classification. If hardwoods in the shade, for instance, have a reflectance signature that resembles conifers in the full sun, both types of tree will end up in the same class. Any location in a training sample taken from a habitat where you would expect to find bears could contain sub-locations that bears avoid.

In the diagram below, class A represents hardwoods and class B represents softwoods. How do you classify a cell that falls in the overlap of the two classes? Should it be classified as class A or B?

Overlap of classes
Overlap of classes

The maximum likelihood classifier calculates for each class the probability of the cell belonging to that class given its attribute values. The cell is assigned to the class with the highest probability, resulting in the term "maximum likelihood."

Several assumptions are necessary for the maximum likelihood classifier to work accurately:

  • The data for each band should be normally distributed.
  • Each class should have a normal distribution in multivariate attribute space.
  • The prior probabilities of the classes must be equal—that is, in the absence of any weighting of attribute values, all classes are equally likely.

If the prior probability is not equal for each class in a study area, you can weight the classes. For example, if classifying a satellite image of Alaska, forest and other vegetation types could receive a higher prior probability than human housing. That is, the odds of a cell location containing a house is much less than for the cell to contain some vegetation type. When a cell value falls in the overlapping portion of the housing and vegetation type classes, there is a higher chance that the location contains vegetation rather than a house, and the location should be classified accordingly.

This probability and weighting logic is based on Bayesian decision rules. The actual probability values for each cell and class are determined from the means and covariance matrix for each class (stored in the signature file).

To perform a classification, use the Maximum Likelihood Classification tool. This tool requires input bands from multiband rasters and individual single band rasters and the corresponding signature file. The manner in which to weight the classes or clusters must be identified. There are three ways to weight the classes or clusters: equal, cells in samples, or file. When equal is chosen, all classes will be weighted with the same prior probability. When cells in samples is chosen, the prior probabilities will be proportional to the number of cells in each class or cluster in the signature file. When file is chosen, the a priori file input control becomes active and the prior probabilities will be read from a specified file. A reject fraction must be identified. The reject fraction identifies the portion of cells that will remain unclassified due to the lowest possibility of correct assignment. The default is 0.0; therefore, every cell will be classified. An optional confidence can be created. Finally, the name of the output raster must be specified.

Class probability

Instead of having the cell assigned to a class based on the highest probability on an output raster, the Class Probability tool outputs probability layers, one band for each input class or cluster. The values at each location for each band stores the probability of that cell belonging to the class or cluster based on the attributes from the original input bands.

This capability can be useful in the following situation. Imagine you are classifying an image, and one class is forest and another is wetland. After running the tool, you discover there is a cell on the forest class output raster that receives a 60 percent chance of belonging to the forest class and, on the wetland output raster, a 30 percent chance of belonging to the wetland class. Instead of classifying the cell location to forest, you may want to classify it as a wet forest.

Review of multivariate classification

Supervised classification

The following are the steps to perform a supervised classification:

  1. Identify the input bands.
  2. Produce training samples from known locations of desired classes.
  3. Develop a signature file.
  4. View and edit the signature file if necessary.
  5. Run the classification.

Unsupervised classification

The following are the steps to perform an unsupervised classification:

  1. Identify the input bands.
  2. Define the number of clusters to be created.
  3. Develop a signature file.
  4. View and edit the signature file if necessary.
  5. Run the classification.

Related topics