Skip To Content

Introduction to ortho mapping


Photogrammetry is the science of obtaining reliable measurements from photographs and digital imagery. The output of photogrammetry is often orthoimage maps, symbolic maps, GIS layers, or three-dimensional (3D) models of real-world objects or scenes. There are two general types of photogrammetry, aerial photogrammetry and close-range photogrammetry.

In aerial photogrammetry the sensor is onboard a satellite, manned aircraft, or a drone and is usually pointed vertically down toward the ground. When the sensor is pointed straight down it is referred to as vertical or nadir imagery. Multiple overlapping images are collected as the sensor flies along a flight path. The imagery is processed to produce digital elevation data and ortho imagery mosaics, which are called ortho maps. Imagery has perspective geometry that results in distortions that are unique to each image. Orthoimages have been geometrically corrected so that the resulting image has the geometric integrity of a map. Other products can be produced resulting in vector GIS layers with features such as roads, buildings, hydrology, and other ground features.

In close-range photogrammetry the sensor is often close to the object of interest and is typically not nadir viewing, but rather looking horizontally, obliquely, or even upward in the case of mapping bridge engineering structure. This imagery is modeled mathematically in slightly different ways, hence the need to distinguish it from aerial photogrammetry. The products are similar to aerial photogrammetry such as 3D models, engineering drawings, and orthoimagery, but instead of mapping terrain and landscape features, the features tend to map other aspects of the surface, such as buildings, engineering structures, or cell and transmission towers.

The tools and capabilities provided in the Esri Ortho Mapping suite of capabilities focus on aerial photogrammetry products to support map generation and revision, change detection, and other feature extraction applications. These tools allow users to take their aerial or satellite imagery and process it to produce a variety of products.


Orthorectification is a process that corrects for geometric distortion inherent in remotely sensed imagery to produce a map-accurate orthoimage. You can then stitch a group of orthoimages together into one layer called an orthomosaic. To accomplish this, you need imagery with known sensor positions, attitudes, and a calibrated geometric model for the sensor along with a digital terrain model (DTM).

Sometimes the known positions and attitudes accompany the imagery when it is delivered to the user. If not, the imagery will need to be adjusted to ground control. The adjustment processes utilize the sensor calibration, sensor orientation information, ground control points, tie points, and a DTM to produce the accurate attitudes and positions. This in turn enables the building of map-accurate orthoimages. The individual orthoimages are then edgematched and color balanced to produce a seamless orthoimage map. This orthoimage mosaic is accurate to a specified map scale accuracy and can be used to make measurements as well as generate and update GIS feature class layers.

Elevation data

If suitable digital elevation data exists it will be used in the orthorectification process. Otherwise, the elevation datasets, such as digital surface models (DSMs) and DTMs, need to be derived from stereo imagery. Stereo imagery is created from two or more images of the same feature collected from different geolocation positions. The overlapping images are collected from different points of view. This overlapping area is referred to as stereo imagery, which is suitable for generating digital elevation datasets. The model for generating these 3D datasets requires a collection of multiple overlapping images with no gaps in overlap, sensor calibration and orientation information, and ground control and tie points. The 3D datasets are then created automatically using a process called image matching, where overlapping imagery is cross-correlated to generate 3D points defined by geolocation (latitude, longitude) and elevation.

The need for ortho mapping

Digital aerial images, scanned aerial images, and satellite imagery are important in general mapping and in GIS data generation and visualization. In fact, the information contained in most maps and GIS layers was generated from imagery. First, the imagery serves as a backdrop that gives GIS layers important context from which to make geospatial associations. Second, imagery is used to create or revise maps and GIS layers by digitizing and attributing features of interest such as roads, buildings, hydrology, and vegetation.

Before this geospatial information can be digitized from imagery, the imagery needs to be corrected for different types of errors and distortions inherent in the way imagery is collected. There are two main types of distortion affecting remotely sensed imagery: radiometric and geometric. Radiometric distortion is the inaccurate translation of ground reflectance values to gray values in the image. Sometimes these values are called density numbers (DNs), which are induced by atmospheric influences and sensor limitations. Geometric distortions are introduced due to perspective projections and instrumentation. Common kinds of distortions affecting raw remotely sensed imagery include platform and sensor errors, earth curvature, and relief displacement as well as radiometric and sun angle effects. Each of these types of distortions are removed in the orthorectification and mapping process.

Orthorectification refers to the removal of geometric distortion induced by the platform, sensor, and especially terrain displacement. Mapping refers to the edgematching, cutline generation, and color balancing of multiple images to produce an orthomosaic dataset. These combined processes are referred to as ortho mapping.

Once the distortions affecting imagery are removed and individual images or scenes are mosaicked together to produce an orthomosaic image map, it may be used like a symbolic or thematic map to make accurate distance and angle measurements. The advantage of the orthoimage map is that it contains all the information visible in the imagery, not just the features and GIS layers extracted from the image and symbolized on a map. For example, a road symbolized on a map had uniform width, whereas a road on the orthoimage has variable width and shoulders that allow emergency vehicles to navigate traffic jams or store road building material and equipment.

Image distortion

The different types of distortion that affect remotely sensed imagery are briefly described in the table below.

Perspective Distortion

Perspective distortion is affected by the look angle of obliquity and distance between the sensor and the ground target, as well as sensor characteristics. The short focal lengths of airborne sensors exhibit more perspective distortion than the long focal lengths of the satellite-based senors. Both focal lengths will show the sides of buildings facing the sensor and mask the back sides of buildings.

Additionally, in perspective images, the scale of the image gets smaller as you move away from the nadir. In other words, the ground sample distance (GSD) is smaller toward the image nadir and larger toward the far edge of the image, and the pixels are trapezoidal in shape.

Field of View (FoV)

FoV is the angular extent that is visible to the sensor during exposure. It is determined by the sensor lens, focal length, and altitude. Focal length is the effective distance from the lens rear nodal point to the focal plane. This determines the perspective geometry of the image. The shorter the focal length, the more perspective distortion is introduced and the wider the FoV.

Lens Distortion

Lenses only approximate perspective geometry. As a result, they distort the placement and shape of ground objects imaged on the focal plane. Radiometrically, they also vary the amount of light reaching the focal plane. Both types of distortion increase as a function of the distance from the center of the image. These effects are minimized at the center and increase toward the edge of the image.

Earth Curvature

Distortion induced by earth curvature is most prevalent in images that cover wide extents of the earth, or look out at high oblique angles from high altitude. It usually affects aerial imagery collected with a short focal length, at high altitude, with a wide FoV, or satellite imagery in strips or blocks.

Relief Displacement

Relief displacement is caused by variable elevation above or below a particular datum, which results in a shift in the object's image position. This topographic variation, coupled with tilt and FoV distortions of the sensor, affects the distance and scale with which features are displayed on the imagery.

Radial Displacement

For example, in vertical imagery, tall objects like radio towers will lean out from the center (nadir point) of the imagery. Since the top of the tower does not lie under the bottom of the tower in the imagery, the effect is referred to as relief displacement.


When scanning aerial photography, distortions are first introduced in film processing and storage. Then additional distortions may be introduced in the scanning process due to lens or other scanning instrumentation. These errors must be largely compensated for in the orthorectification process.

The orthorectification process

Orthorectification is the process of removing the effects of image distortion induced by the sensor, viewing perspective, and relief for the purpose of creating a planimetrically correct image. The resulting orthorectified images have a constant scale such that features are represented in their true positions in relation to their ground position. This enables accurate measurement of distances, angles, and areas in the orthoimage.

There are several requirements to produce an orthoimage map or orthomosaic from raw imagery:

  • Digital imagery, which can be in the form of a digital airborne image, scanned image, or satellite imagery.
  • Camera calibration file that includes measurements of sensor characteristics, such as focal length, size and shape of the imaging plane, pixel size, and lens distortion parameters. In photogrammetry, the measurement of these parameters is called interior orientation (IO), and they are encapsulated in a camera model file. High-precision aerial mapping cameras, called metric cameras, are analyzed to provide camera calibration information in a report used to compute a camera model. Other consumer-grade cameras are calibrated by those operating the cameras, or they can be calibrated during the adjustment processes during orthorectification.
  • Rational Polynomial Coefficients (RPC) supplied by satellite imagery providers. RPCs are computed for each image and describe the transformation from 2D image coordinates to 3D earth surface coordinates in a mathematical sensor model that is expressed as the ratio of two cubic polynomial expressions. The coefficients of these two rational polynomials are computed by the satellite company from the satellite's orbital position and orientation and the rigorous physical sensor model. RPCs replace the need for a rigorous camera model and are often referred to as replacement sensor models if the error covariance matrices are included.
  • Adjustment points, which are composed of ground control points, image tie points, and check points.
    • Ground control points are usually from ground survey. Secondary control points can also be utilized created from a map or existing orthoimage with known accuracy, as long as the known accuracy exceeds the expected outcome accuracy by a linear factor of three to five times. These points on the ground need to be visible in the imagery.
    • Image tie points generated in the overlap areas between adjacent images composing the mosaic. These are usually generated automatically using image matching techniques.
    • Check points used for assessing the accuracy of the orthorectification process. These are ground control survey points not used in computing the photogrammetric solution.

The information above is used to compute an image orientation needed to produce a digital elevation model (DEM) and an orthorectified image mosaic from imagery. The derived image orientation parameters include the position of the sensor at the instant of image capture in some global reference system such as latitude, longitude, and altitude (x, y, z). The attitude of the sensor is expressed as omega, phi, and kappa (pitch, roll, heading).

Orthomosaic generation

The general workflow to generate an orthomosaic is outlined in this section. Specific details on how to create an orthomosaic dataset using the Ortho Mapping tools and wizards are fully described in Ortho mapping in ArcGIS Pro.

Image orientation

Aerial triangulation is a process of optimally piecing together a block of images to create an accurate image mosaic map. Image orientation is a prerequisite for generating DEMs and orthoimagery. It is a process of determining the spatial position and orientation of the images and the overlap regions of adjacent images; this is important for tie point generation. The tie point generation process will place all the images correctly into a contiguous block. It uses the interior orientation based on sensor intrinsics and exterior orientation based on ground control and tie points between images.

Collecting tie points between multiple overlapping images can be tedious and time consuming. The Compute Tie Points tool automatically identifies coincident points in the overlap areas between images using cross-correlation techniques. These tie points are used together with ground control points, which are also visible in multiple images, to compute the exterior orientation of each image comprising the mosaic. This means that the ground control must be photo-identifiable (or visible) in the imagery. Typical photo-identifiable ground control points are persistent and readily identifiable features. They may be painted targets on a highway or the center of two intersecting streets.

Block adjustment

Using the ground control and tie point information, a bundle adjustment computation calculates the exterior orientation for each image, such that they are consistent with neighboring images. The orientation for the whole block of images is then adjusted to fit the ground. This block adjustment process produces the best statistical fit between images, for the whole contiguous block, minimizing errors with the ground control.

Quality assurance and quality control

When the block of images is adjusted to fit the ground, the apparent error of the adjusted points is presented in a table of residual errors. Blunders are readily identified and the points in error are either deleted or more often manually repositioned. Other points having unacceptable errors are also repositioned, and the adjustment is recomputed until both the overall error and residual error of each point is acceptable.

DEM generation

Once the orientation is completed, a digital elevation dataset is automatically produced, using image cross-correlation techniques. This digital elevation dataset is then edited to remove vegetation, buildings, and other above-ground features to produce a DTM.

Image orthorectification

Orthorectification is the process of removing the effects of image distortion induced by the sensor, viewing perspective, and relief for the purpose of creating a planimetrically correct image.

This is accomplished by establishing the relationship of the x,y image coordinates to the real-world GCP to determine the algorithm for resampling the image. Similarly, the mathematical relationship between the ground coordinates represented by the DEM and the image is computed and used to determine the proper position of each pixel in the source image. The generation of the orthoimage involves warping the source image so that distance and area are uniform in relationship to real-world measurements. Thus, features measured in the orthoimages match the measurement, scale, and angle of the same features on the ground, regardless of whether they exist on steep terrain or on level ground. The resulting accuracy of the orthoimage is based on the accuracy of the triangulation, the resolution of the source image, and the accuracy of the elevation model.

If high-resolution nadir satellite imagery is used for the generation of orthomosaics, a DSM can be produced and edited. Conversely, nadir high-resolution satellite imagery is not affected much by distortion that is inherent in aerial imagery due to the large distance between sensor and ground, long sensor focal length (on the order of 10 meters), and small FoV. These factors, together with accurate orientation information in the form of RPCs, result in the condition that DEM accuracy and dense postings are less important in producing accurate orthoimagery, as long as the adjusted exterior orientation and control points are adequate. Thus, the DEM generation step is often not used and existing USGS NED DEMs or SRTM DEMs, together with accurate GCPs, can produce Class I or Class II orthoimagery at a scale of 1:5,000 or smaller.

Related topics