Image and raster data storage and management

Image data is often processed to create forms that can be processed on the fly, or saved as another updated version. These image datasets, and collections of them, are often large, so having good management capabilities is important. ArcGIS Pro is designed to do this.

Image and raster data structures and storage models

There are three methods to store image and raster data: as files in a file system, in a geodatabase, or managed from the geodatabase but stored in a file system. This decision also involves determining whether to store all the data in a single dataset or in a catalog of potentially many datasets. If you store the data in a file system, you store raster datasets. A geodatabase can store either raster datasets or mosaic datasets.

Raster dataset

Most image and raster data (such as an orthoimage or DEM) is provided as a raster dataset. The term raster dataset refers to any raster model that is stored on disk or is accessible as a single image stored in cloud storage. A raster dataset is the most basic raster data storage model on which the others are built—for example, mosaic datasets manage raster datasets. It's also the output from many geoprocessing tools that process raster data.

Raster dataset example

A raster dataset is any valid image or raster format organized into one or more bands. Each band consists of an array of pixels, and each pixel has a value. An image or raster dataset has at least one band. ArcGIS Pro supports more than 70 file formats for raster datasets, including TIFF, JPEG 2000, Cloud Raster Format (CRF), and NITF.

Mosaic dataset

A mosaic dataset is a collection of raster datasets (images) stored as a collection of images and viewed or accessed as a single mosaicked image or individual images (rasters). These collections can be very large in both total file size and number of datasets. The images in a mosaic dataset can remain in their native format on disk or exist in the geodatabase. The metadata can be managed in the image's record as well as attributes in the attribute table. Storing metadata as attributes enables parameters such as sensor orientation data to be managed more easily and allows fast queries to enable selections.

Mosaic dataset diagram
Individual images and metadata comprise mosaic datasets.

The data in a mosaic dataset does not have to be adjoining or overlapping but can exist as unconnected, discontinuous datasets. For example, you can have images that completely cover an area, or you can have many strips of images that may not join together to form a continuous image (such as along pipelines).

Continuous data coverage
Continuous data coverage is shown.
Discontinuous data coverage
Discontinuous data coverage is shown.

The data can be completely or partially overlapping but captured over multiple dates. The mosaic dataset is an ideal dataset for storing temporal data. You can query the mosaic dataset for the images you need based on time or date and use a mosaic method to display the mosaicked image according to a time or date attribute.

Mosaic datasets are not limited to one particular type of image data. You can add image data from different sensor systems in different projections, resolutions, pixel depths, and numbers of bands. Overviews can be generated for the entire data collection. This allows for faster viewing of the data and allows you to quickly serve these datasets. There are also additional properties for viewing, including setting a mosaicking method, that make these datasets unique and functional in many situations. You can also query a mosaic dataset based on spatial and nonspatial query constraints. The results of that query can be a set of images that you can process one by one, or it could be a dynamically generated mosaicked image.

In addition to image data, you can store and manage lidar data in a mosaic dataset in the same way as image datasets, and even together with image datasets. The lidar data can be stored in the file system as .las files or LAS datasets, or in a geodatabase as a terrain dataset.

Note:

Mosaic datasets are dependent on the version of ArcGIS on which they were built and are compatible across the ArcGIS platform for a particular release cycle. In general, mosaic datasets created with earlier versions of ArcGIS can be read and handled with later versions of ArcGIS. However, a mosaic dataset created with a later version of ArcGIS may not be backward compatible with earlier versions. See the following table for mosaic dataset compatibility:

Table of mosaic dataset compatibility between versions

Raster data storage model comparison

Storing image datasets individually is often the best method when the images are not adjacent to each other or are rarely used in the same project. Mosaicking inputs together to form one large, single extent image data file is appropriate for many applications, but a mosaic dataset may be preferable for the following reasons:

  • The extents of the images partially or fully overlap and you want the common areas to be preserved.
  • The image datasets represent a collection of observations of the same area at different times in a time series.
  • You only want to display the study area, not the entire collection of images.
  • You want to manage a collection of images as an integrated set but keep their individual states.
  • You want to record and manage additional attribute columns that describe each image.

Compare the raster data storage models

Raster datasetMosaic dataset

Description

A single picture of an object or a seamless image covering a spatially continuous area. This may be a single original image or the result of many merged (mosaicked) images.

Raster dataset

A group of image datasets stored as a collection that allows you to store, manage, view, and query collections of image and lidar data. It is viewed as a mosaicked image, but you have access to each dataset as an item in the collection.

Mosaic dataset

Storage

As a file on disk or in a geodatabase.

In a geodatabase, but the source data, stored as a file on disk, can be referenced.

Homogeneous or heterogeneous data

Homogeneous data: a single format, data type, and file.

Heterogeneous data: multiple formats, data types, file sizes, and coordinate systems.

Metadata

Stored once and applies to a complete dataset.

Can be stored in the raster record and as attributes in the attribute table.

Downsampled datasets

Pyramids of the entire image dataset.

Pyramids for each image dataset, as well as overviews for the entire collection.

Geoprocessing and image analysis

As a data source in many geoprocessing and analysis tools, including raster functions and RFTs.

As a data source in many geoprocessing and analysis tools, including raster functions and RFTs.

Pros

  • Fast to display at any scale.
  • Can be moved as a single file in the geodatabase.
  • Manages large collections of image data.
  • Name and format of input images are maintained.
  • Fast to display at any scale.
  • No loss of data to create mosaic.
  • User has access to full content of collection.
  • Properties can be set to control the mosaicked display.
  • On-the-fly processing.

Cons

  • File geodatabase image datasets are slower to update because the entire file must be rewritten.
  • Original file name and format is lost during creation.
  • Depending on the file size and compression of the input images, the file size of the output may be much larger than the input data.
  • Creation of single files can be slow.

  • Overviews can take time to generate.
  • If the source images are moved to a new file location, the references in the mosaic dataset must be adjusted.

Serving

Served directly as an image service.

Served directly as an image service.

Recommendations

Use raster datasets when overlaps between mosaicked images do not need to be retained and for fast display of large quantities of image data.

Use a mosaic dataset for managing and visualizing image and lidar data. It's well suited for multidimensional data, querying, storing metadata, and overlapping data, and it provides a hybrid solution.

A comparison of the raster data storage models

Raster data storage in the geodatabase

Store raster data in the geodatabase when you want to manage images, add behavior, and control the schema; when you want to manage a well-defined set of raster datasets as part of a database management system (DBMS); and when you require a single architecture for managing all content. There are two primary types of geodatabases: enterprise and file.

The enterprise geodatabase can support multiple operations in its DBMS. File geodatabases are designed to be edited by a single user and do not support versioning. They reside in your file system directory, so they do not require a password for access. File geodatabases and enterprise geodatabases share the same basic storage schema.

Note:

The functional behavior of each geodatabase is basically the same; however, there are some exceptions for specific tools or procedures. For information about the differences in behavior by a tool or procedure, refer to the specific tool or procedure in the reference help.

Compare raster storage in file and enterprise geodatabases

Raster storage characteristicFile geodatabaseEnterprise geodatabase

Size limit

1 terabyte (TB) for each raster dataset

Unlimited; limit dependent on DBMS limits

Raster dataset file format

File geodatabase raster dataset

Enterprise geodatabase raster dataset

Storage

  • Raster dataset: Managed
  • Mosaic dataset: Unmanaged
  • Raster dataset: Managed
  • Mosaic dataset: Unmanaged

Stored in the file system

Stored in an RDBMS

Compression

LZ77, JPEG, JPEG 2000, or none

LZ77, JPEG, JPEG 2000, or none

Pyramids

Supports partial pyramids

Supports partial pyramids

Mosaicking

Allows you to append to a raster dataset when mosaicking

Allows you to append to a raster dataset when mosaicking

Updating

Allows incremental updating

Allows incremental updating

Number of users

Single user and small workgroups; some readers and one writer

Multiuser; many users and many writers

Comparison of file geodatabases and enterprise geodatabases

Raster block table in the geodatabase

Raster data is typically much larger in size than features and requires a side table for storage. For example, a typical panchromatic orthoimage can have 20,000 rows by 20,000 columns (400 million pixel values), or more.

To optimize performance with larger raster datasets, a geodatabase raster is divided into smaller tiles (referred to as blocks) with a typical size of approximately 128 rows by 128 columns or 256 rows by 256 columns. These smaller blocks are then held in a side table for each raster. Each separate tile is held in a separate row in a block table, as shown below.

Block table view diagram

This structure means that only the blocks for an extent are retrieved when they are needed, instead of the entire image. In addition, resampled blocks that are used to build raster pyramids can be stored and managed in the same block table as additional rows.

This enables images of enormous sizes to be managed in a DBMS that produce very fast performance. A DBMS also provides secure multiuser access.

File geodatabase

The storage model of the file geodatabase is similar to that of an enterprise geodatabase, which stores data in blocks.

This provides more efficient access to data—especially during the mosaic operation. When mosaicking data in a file geodatabase, only overlapping blocks are updated. If an overlapping block does not exist, a new block is inserted. Partial blocks are padded with NoData pixels. In addition, the file geodatabase (and enterprise) storage model uses partial pyramid updates, which saves time. The data structures of file and enterprise geodatabases are the same—fast copy technology is used to copy and paste data between them.

File geodatabases accept configuration keywords, but unlike enterprise geodatabases, the configuration keywords have a standard predefined value. For more information about configuration keywords, see Configuration keywords for file geodatabases.

Enterprise geodatabase

When raster data is stored in an enterprise geodatabase, it offers an enterprise level of functionality, such as security, multiuser access, and data sharing. There are three main reasons to store raster data as an enterprise geodatabase:

  • It will not be updated as regularly (such as every two or three years or longer).
  • It will be accessed in read-only use cases (such as using it as basemap data under vector data).
  • Hundreds of users (or more) will access it as a basemap.

Because of its storage structure, the raster data is considered to be managed, or fully controlled, by the geodatabase. Enterprise geodatabases store all the raster information (pixels, spatial reference, any associated table, and other metadata) for raster datasets and raster attributes in the associated relational database. This means that all input raster information is loaded into the database and can be thought of as a format conversion.

The enterprise geodatabase evenly tiles the bands into blocks of pixels according to a user-defined dimension (the default is 128 by 128). Tiling the image band data enables efficient storage and retrieval of the raster data. The pyramid information is stored according to a declining resolution. The height of the pyramid is determined by the number of levels specified by the application or user.

The raster blocks table (the largest table and the one that stores the pixel information and pyramids) stores one row per block (tile) per band in a raster dataset and per pyramid level. For example, a three-band image divided into 12 blocks with no pyramids built will have 36 rows in the BLK table—12 separate blocks for each of the bands. The column containing the pixel data for the block is a binary large object (BLOB).

Compression, pyramids, and tile size

There are other storage structures to consider when storing and managing raster data, including compression, downsampled datasets (pyramids and overviews), and tile size.

Compression

There are two types of compression: lossless and lossy. Lossless compression means the values of pixels in the raster dataset are not changed, whereas lossy compression results in altered pixel values. The amount of compression depends on the type of pixel data; the more homogeneous the image, the higher the compression ratio. You should store data that will be used for analysis, not just display, using a lossless compression. The primary benefit of compressing data is that it requires less storage space; the amount of savings depends on the method of compression and the redundancy in the data. An added benefit is the significantly improved performance because you are transferring fewer packets of data. For example, when accessing image data over a network with low bandwidth, the use of compression can offer improved performance because the amount of information to be transferred is reduced, making it possible to store large, seamless image datasets and serve them quickly to a client for display.

Mosaic datasets also have compression. This is not for storage of the image dataset being managed; it is for the compression applied to the image it generates when displayed. This also improves accessing data over the network by reducing the size of the file that is transferred. For more details about the Allowed Compression Method property, see Mosaic dataset properties.

Downsampled datasets for fast display

Downsampled datasets are rasters created from the original data for either raster datasets or mosaic datasets. They are generated to improve display speed and performance. When they are created for raster datasets, they are called pyramids, and when they are created for mosaic datasets, they are called overviews.

Pyramids versus overviews

PyramidsOverviews

Created for

Raster datasets

Mosaic datasets

Format

Writes .ovr files—with a few exceptions.

Reads pyramids stored externally as .ovr or .rrd files or internally.

Writes as .tif files.

Storage

In a single file that generally resides next to the source raster dataset using the same name.

By default, in a folder next to the geodatabase with the *.overviews extension, or internally for enterprise geodatabases.

Storage location is customizable.

Storage size

2 to 10 percent (compared to original raster datasets).

Downsampling factor

2

Default is 3. This can be edited.

Extent

  • Each pyramid level covers the entire raster dataset.
  • You can specify the number of levels to generate.

  • Can cover part or all of a mosaic dataset.
  • Each level can consist of one or more images.

Options when building

  • Number of levels to create
  • Resampling method
  • Compression method and quality

  • Number of levels to create
  • Tile size
  • Base pixel size
  • Resampling method
  • Compression method and quality
  • Output location
  • Overview Sampling factor

Pyramids versus overviews

Tile size

In an enterprise geodatabase or a file geodatabase, raster data is stored in a structure in which the data is tiled, indexed, pyramided, and most often compressed. Because of tiling, indexing, and pyramiding, each time the raster data is queried, only the tiles that are necessary to satisfy the extent and resolution of the query are returned instead of the whole dataset. The tile size controls the number of pixels you want to store in each database memory block. This is specified as a number of pixels in x and y. The default tile size is 128 by 128 pixels, and most applications do not require changing these default values. In an enterprise geodatabase, the tiles of raster data are compressed before they are stored in the geodatabase.

Related topics