How Principal Components works

Available with Spatial Analyst license.

The Principal Components tool is used to transform the data in the input bands from the input multivariate attribute space to a new multivariate attribute space whose axes are rotated with respect to the original space. The axes (attributes) in the new space are uncorrelated. The main reason to transform the data in a principal component analysis is to compress data by eliminating redundancy.

An example of data redundancy is evident in a multiband raster comprising elevation, slope, and aspect (on a continuous scale). Since slope and aspect are usually derived from elevation, most of the variance within the study area can be explained just by the elevation.

The result of the tool is a multiband raster with the same number of bands as the specified number of components (one band per axis or component in the new multivariate space). The first principal component will have the greatest variance, the second will show the second most variance not described by the first, and so forth. Many times, the first three or four rasters of the resulting multiband raster from principal components tool will describe more than 95 percent of the variance. The remaining individual raster bands can be dropped. Since the new multiband raster contains fewer bands, and more than 95 percent of the variance of the original multiband raster is intact, the computations will be faster, and the accuracy is maintained.

Principal Components requires the input bands to be identified, the number of principal components into which to transform the data, the name of the statistics output file, and the name of the output raster. The output raster will contain the same number of bands as the specified number of components. Each band will depict a component.

Principal component analysis concepts

Conceptually, using a two-band raster, the shifting and rotating of the axes and transformation of the data is accomplished as follows:

  • The data is plotted in a scatterplot.
  • An ellipse is calculated to bound the points in the scatterplot (see the figure below).
    Boundary of ellipse plotted
    Boundary of ellipse plotted
  • The major axis of the ellipse is determined (see the figure below). The major axis becomes the new x-axis, the first principal component (PC1). PC1 depicts the greatest variation because it is the largest transect that can be drawn through the ellipse. The direction of PC1 is the eigenvector, and its magnitude is the eigenvalue. The angle of the x-axis to PC1 is the angle of rotation that is used in the transformation.
    First principal component
    First principal component
  • An orthogonal line perpendicular to PC1 is calculated. This line is the second principal component (PC2) and the new axis for the original y-axis (see the figure below). The new axis describes the greatest variance not described by PC1.
    Second principal component
    Second principal component

Using the eigenvectors, the eigenvalues, and the calculated covariance matrix of the input of the multiband raster, a linear formula defining the shift and rotation is created. This formula is applied to transform each cell value relative to the new axis.

Example

The following is an example of the output data file created for three principal components:

                    COVARIANCE MATRIX
#    Layer            1            2            3
#  -----------------------------------------------------------
1           34.1763      31.2377      51.8100
2           31.2377     212.6159      99.9540
3           51.8100      99.9540     118.8057
#  ===========================================================

#                    CORRELATION MATRIX
#    Layer            1            2            3
#  -----------------------------------------------------------
1            1.0000       0.3665       0.8131
2            0.3665       1.0000       0.6289
3            0.8131       0.6289       1.0000
#  ===========================================================

#               EIGENVALUES AND EIGENVECTORS
# Number of Input Layers     Number of Principal Component Layers
3                                3
# PC Layer            1            2            3
#  -----------------------------------------------------------
# Eigen Values
287.8278      69.8781       7.8920
# Eigen Vectors
# Input Layer
1            0.2112       0.4718       0.8560
2            0.8116      -0.5727       0.1154
3            0.5447       0.6704      -0.5039
#  ===========================================================

References

Campbell, James B. Introduction to Remote Sensing. The Guilford Press. 1987.

Jensen, John R. Introductory Digital Image Processing: A Remote Sensing Perspective. Prentice–Hall. 1986.

Lillesand, Thomas M., and Ralph W. Kiefer. Remote Sensing and Image Processing. John Wiley and Sons. 1987.

Richards, John A. Remote Sensing Digital Image Analysis: An Introduction. Berlin: Springer–Verlag. 1986.

Related topics