What is photogrammetry?

Available with Advanced license.


Photogrammetry is the science of obtaining reliable measurements from photographs and digital imagery. The output of the photogrammetric process is often orthomosaic maps, symbolic maps, GIS layers, or three-dimensional (3D) models of real-world objects or scenes. There are two general types of photogrammetry, aerial photogrammetry and close-range photogrammetry.

In aerial photogrammetry the sensor is onboard a satellite, manned aircraft, or a drone and is usually pointed vertically down toward the ground. When the sensor is pointed straight down it is referred to as vertical or nadir imagery. Multiple overlapping images - called stereo imagery - are collected as the sensor flies along a flight path. The imagery is processed to produce digital elevation data and orthomosaics. Imagery has perspective geometry that results in distortions that are unique to each image. Orthoimages have been geometrically corrected so that the resulting image has the geometric integrity of a map, and orthomosaics are orthoimages that have been mosaicked into a single image. Other products can be produced such as vector GIS layers with features such as roads, buildings, hydrology, and other ground features. These layers are produced using the orthoimagery as a backdrop, or from stereo image compilation in ArcGIS Pro.

In close-range photogrammetry the sensor is often close to the object of interest and is typically not nadir viewing, but rather looking horizontally, obliquely, or even upward in the case of mapping bridge engineering structure. This imagery is modeled mathematically in slightly different ways, hence the need to distinguish it from aerial photogrammetry. The products are similar to aerial photogrammetry such as 3D models, engineering drawings, and orthoimages, but instead of mapping terrain and landscape features, the features tend to map other aspects of the surface, such as buildings, engineering structures, or cell and transmission towers.

The tools and capabilities provided in the Esri Ortho Mapping suite of capabilities focus on aerial photogrammetry products to support map generation and revision, change detection, and other feature extraction applications. These tools allow users to take their aerial, drone, or satellite imagery and process it to produce a variety of orthorectified products.


Orthorectification is a process that corrects for many artifacts related to remotely sensed imagery to produce a map-accurate orthoimage. Orthoimages can then be edgematched and color balanced to produce a seamless orthomosaic. This orthomosaic is accurate to a specified map scale accuracy and can be used to make measurements as well as generate and update GIS feature class layers. To accomplish this, you need imagery with known sensor positions, attitudes, and a calibrated geometric model for the sensor along with a digital terrain model (DTM).

Sometimes the known positions and orientation accompany the imagery when it is delivered to the user. If not, the imagery will need to be adjusted to ground control. The adjustment processes utilize the sensor calibration, sensor orientation information, ground control points, tie points, and a DTM to produce the accurate attitudes and positions. This in turn enables the building of map-accurate orthoimages. .

Elevation data

If a suitable digital elevation model (DEM) exists, it will be used in the orthorectification process. Otherwise, the elevation datasets, such as digital surface models (DSMs) and DTMs, need to be derived from stereo imagery. Stereo imagery is created from two or more images of the same ground feature collected from different geolocation positions. The overlapping images are collected from different points of view. This overlapping area is referred to as stereo imagery, which is suitable for generating digital elevation datasets. The model for generating these 3D datasets requires a collection of multiple overlapping images with no gaps in overlap, sensor calibration and orientation information, and ground control and tie points. The 3D datasets are then created automatically using a process called image matching, where overlapping imagery is cross-correlated to generate 3D point clouds defined by geolocation (latitude, longitude) and elevation.

The need for ortho mapping

Orthorectification refers to the removal of geometric inaccuracies induced by the platform, sensor, and especially terrain displacement. Mapping refers to the edgematching, cutline generation, and color balancing of multiple images to produce an orthomosaic dataset. These combined processes are referred to as ortho mapping.

Digital aerial images, drone images, scanned aerial photographs, and satellite imagery are important in general mapping and in GIS data generation and visualization. In fact, the information contained in most maps and GIS layers was generated from imagery. First, the imagery serves as a backdrop that gives GIS layers important context from which to make geospatial associations. Second, imagery is used to create or revise maps and GIS layers by digitizing and attributing features of interest such as roads, buildings, hydrology, and vegetation.

Before this geospatial information can be digitized from imagery, the imagery needs to be corrected for different types of errors and distortions inherent in the way imagery is collected. There are two main types of distortion affecting remotely sensed imagery:

  • Radiometric distortion—The inaccurate translation of ground reflectance values to gray values or digital numbers (DNs) in the image. Radiometric error is caused by the sun’s azimuth and elevation, atmospheric conditions, and sensor limitations.
  • Geometric distortion—The inaccurate translation of scale and location in the image. Geometric error is caused by terrain displacement, the curvature of the Earth, perspective projections and instrumentation.

Each of these types of inaccuracies are removed in the orthorectification and mapping process. For a list of the common types of image inaccuracies, see the table below. Once the distortions affecting imagery are removed and individual images or scenes are mosaicked together to produce an orthomosaic, it may be used like a symbolic or thematic map to make accurate distance and angle measurements. The advantage of the orthoimage is that it contains all the information visible in the imagery, not just the features and GIS layers extracted from the image and symbolized on a map. For example, a road symbolized on a map has uniform width, whereas a road on the orthoimage has variable width and shoulders that allow emergency vehicles to navigate traffic jams or store building material and equipment.

The orthorectification process

One of the most important products generated by the photogrammetric process is an orthorectified collection of images, called an orthoimage mosaic, or simply orthomosaic. The generation of the orthoimage involves warping the source image so that distance and area are uniform in relationship to real-world measurements. This is accomplished by establishing the relationship of the x,y image coordinates to real-world GCPs to determine the algorithm for resampling the image. Similarly, the mathematical relationship between the ground coordinates represented by a DEM and the image is computed and used to determine the proper position of each pixel from the source image.

Thus, features measured in the orthoimages match the measurement, scale, and angle of the same features on the ground, regardless of whether they exist on steep terrain or on level ground. The resulting accuracy of the orthoimage is based on the accuracy of the triangulation, off-nadir image collection angle, resolution of the source image, and the accuracy of the elevation model.

There are several requirements to produce an orthoimage or orthomosaic from raw imagery:

  • Digital imagery—Can be in the form of a digital airborne image, scanned image, or satellite imagery.
  • Camera calibration file—Includes measurements of sensor characteristics, such as focal length, size and shape of the imaging plane, pixel size, and lens distortion parameters. In photogrammetry, the measurement of these parameters is called interior orientation (IO), and they are encapsulated in a camera model file. High-precision aerial mapping cameras are analyzed to provide camera calibration information in a report used to compute a camera model. Other consumer-grade cameras are calibrated by those operating the cameras, or they can be calibrated during the adjustment processes during orthorectification.
  • Rational Polynomial Coefficients (RPC)—Supplied by satellite imagery providers. RPCs are computed for each satellite image and describe the transformation from 3D earth surface coordinates to 2D image coordinates in a mathematical sensor model that is expressed as the ratio of two cubic polynomial expressions. The coefficients of these two rational polynomials are computed by the satellite company from the satellite's orbital position and orientation and the rigorous physical sensor model. RPCs replace the need for a rigorous camera model and are often referred to as replacement sensor models if the error covariance matrices are included.
  • Adjustment points—Composed of ground control points, image tie points, and check points.
    • Ground control points are usually from ground survey location and measurements. Secondary control points can also be derived from a map or existing orthoimage with known accuracy, as long as the known accuracy exceeds the expected accuracy of the new orthoimage by a linear factor of three to five times. These points on the ground need to be visible in the imagery.
    • Image tie points are generated in the overlap areas between adjacent images composing the mosaic. These points serve to tie together all the imagery comprising the orthoimage mosaic. These are usually computed automatically using image matching techniques in the overlap area.
    • Check points are used for assessing the accuracy of the orthorectification process. These are ground control survey points not used in computing the photogrammetric adjustment.

The information above is used to compute an image orientation needed to produce a DEM and an orthorectified image mosaic from imagery. The derived image orientation parameters include the position of the sensor at the instant of image capture in coordinates such as latitude, longitude, and height (x, y, z). The attitude of the sensor is expressed as omega, phi, and kappa (pitch, roll, heading).

Orthomosaic generation

The general workflow to generate an orthomosaic is outlined in this section. ArcGIS Pro provides tools, capabilities and guided workflows to work through the process of creating DEM and Orthoimage products. Specific details on how to create an orthomosaic using the Ortho Mapping tools and wizards are fully described in Orthomapping in ArcGIS Pro.

Image orientation

Image orientation is a prerequisite for generating DEMs and orthoimagery. It is a process of determining the spatial position and orientation of the sensor at the time each image was captured. Knowing the height of the sensor above the ground allows calculation of the overlap regions of adjacent images, which is then used to enable tie point generation. The tie point generation process will place all the images correctly into a contiguous block. It uses the interior orientation based on physical sensor characteristics and exterior orientation based on ground control and tie points between images.

Collecting tie points between multiple overlapping images can be tedious and time consuming. The Compute Tie Points tool automatically identifies coincident points in the overlap areas between images using cross-correlation techniques. These tie points are used together with ground control points, which are also visible in multiple images, to compute the exterior orientation of each image comprising the mosaic. This means that the ground control must be photo-identifiable (or visible) in the imagery. Typical photo-identifiable ground control points are persistent and readily identifiable features. They may be painted targets on a highway or the center of two intersecting streets.

Block adjustment

Using the ground control and tie point information, a bundle adjustment computation calculates the exterior orientation for each image, such that they are consistent with neighboring images. The orientation for the whole block of images is then adjusted to fit the ground. This block adjustment process produces the best statistical fit between images, for the whole contiguous block, minimizing errors with the tie points and ground control. The adjusted transformation for each image item comprising the block is recorded in the solutions table and stored in the workspace for the orthomosaic.

Quality assurance and quality control

When the block of images is adjusted to fit the ground, the apparent error of the adjusted points is presented in a table of residual errors. Blunders are readily identified, and the points with high-residual error are either deleted or more often manually repositioned. The adjustment is recomputed until both the overall error and residual error of each point is acceptable.

DEM generation

Once the block adjustment orientation is completed, an elevation dataset can be produced using the DEMs wizard. A photogrammetric point cloud is created to produce the DEM using image cross-correlation techniques. The DEM is then used in the image orthorectification process to remove terrain distortions and produce an orthomosaic.

Two types of DEMs can be produced:

  • DTM—Digital elevation of the earth, not including the elevation of any objects on it. This is also referred to as bare-earth elevation. The bare earth DTM dataset is used to produce the orthoimage and orthomosaics.
  • DSM—Digital elevation of the earth, including the elevation of objects on it such as trees and buildings. The DSM is a valuable analytical dataset used for classifying features in orthoimages, such as discriminating asphalt pavement and asphalt roofs. It should not be used for image orthorectification unless the source imagery is nadir looking, with no building or feature lean, to produce true orthoimages.


If a forest area is heavily wooded, or has other dense vegetation cover, it will not be possible to derive a DTM ground surface because the ground is not visible. The most appropriate elevation surface product for densely forested land cover is a DSM, which specifically creates a surface depicting the top of the tree canopy.

The DEMs wizard allows you to define various parameter settings for generating the elevation point cloud and DEM. The DTM is then used in the image orthorectification process to remove terrain distortions and produce an orthomosaic.

Image orthorectification

An orthorectified image has a constant scale such that features are represented in their true positions in relation to their ground position. This enables accurate measurement of distances, angles, and areas in the orthoimage.

Orthorectification is accomplished by establishing the relationship of the x,y image coordinates to the real-world GCP to determine the algorithm for resampling the image. Similarly, the mathematical relationship between the ground coordinates, represented by the DTM, and the image is computed and used to determine the proper position of each pixel in the source image.

The orthomosaic is produced using the Orthomosaic wizard. The inputs include the block-adjusted items comprising the image collection and the DTM. An existing bare earth DEM can also be used. The Orthomosaic wizard allows you to define the settings for mosaicking your orthoimages such as scale and data format, seamline generation, and color balancing between orthorectified images to create a seamless orthomosaic.


Nadir high-resolution satellite imagery is not affected much by distortion that is inherent in aerial imagery due to the large distance between sensor and ground, long sensor focal length (on the order of 10 meters), and small field of view. These factors, together with accurate orientation information in the form of RPCs, result in the condition that DEM accuracy and dense postings are less important in producing accurate orthoimages, as long as the adjusted exterior orientation and control points are adequate. Thus, the DEM generation step is often not used, and Esri's World Elevation (or existing USGS NED DEMs or SRTM DEMs), together with accurate GCPs, can produce Class I or Class II orthoimages at a scale of 1:5,000 or smaller.

If the off-nadir collection is large, or the focal length is small, a more accurate, higher resolution DEM is needed for accurate orthorectification.

Image artifacts

The types of artifacts that affect remotely sensed imagery, and addressed in the orthorectification process, are briefly described in the table below.

Perspective Distortion

Perspective distortion is affected by the look angle of obliquity and distance between the sensor and the ground target, as well as sensor characteristics. The short focal lengths of airborne sensors exhibit more perspective distortion than the long focal lengths of the satellite-based senors. The viewing perspective will show the sides of buildings facing the sensor and mask the back sides of buildings.

Additionally, in perspective images, the scale of the image gets smaller as you move away from the nadir. In other words, the ground sample distance (GSD) is smaller toward the image nadir and larger toward the far edge of the image, and the pixels are trapezoidal in shape.

Field of View (FoV)

FoV is the angular extent that is visible to the sensor during exposure. It is determined by the sensor size, focal length, and altitude. Focal length is the effective distance from the lens rear nodal point to the focal plane. This determines the perspective geometry of the image. The shorter the focal length, the more perspective distortion is introduced and the wider the FoV.

Lens Distortion

Lenses only approximate perspective geometry. As a result, they distort the placement and shape of objects imaged on the focal plane. Radiometrically, they also vary the amount of light reaching the focal plane. Both types of distortion increase as a function of the distance from the center of the image. These effects are minimized at the center and increase toward the edge of the image.

Earth Curvature

Distortion induced by earth curvature is most prevalent in images that cover wide extents of the earth, or look out at high oblique angles from high altitude. It usually affects aerial imagery collected with a short focal length, at high altitude, with a wide FoV, or satellite imagery in strips or blocks.

Relief Displacement

Relief displacement is caused by variable elevation above or below a particular datum, which results in a shift in the object's apparent position in the image. This topographic variation, coupled with view orientation and FoV of the sensor, affects the distance and scale with which features are displayed on the imagery.

Radial Displacement

For example, in vertical imagery, tall objects like radio towers will appear lean out from the center (nadir point) of the imagery. Since the top of the tower does not lie under the bottom of the tower in the imagery, the effect is referred to as relief displacement.


When scanning aerial photography, distortions are first introduced in film processing and storage. Then additional distortions may be introduced in the scanning process due to lens or other scanning instrumentation. These errors must be largely compensated for in the orthorectification process.

Related topics