Image data is often processed to create forms that can be processed on the fly, or saved as another updated version. These image datasets, and collections of them, are often large, so having good management capabilities is important. ArcGIS Pro is designed to do this.
Image and raster data structures and storage models
There are three methods to store image and raster data: as files in a file system, in a geodatabase, or managed from the geodatabase but stored in a file system. This decision also involves determining whether to store all the data in a single dataset or in a catalog of potentially many datasets. If you store the data in a file system, you store raster datasets. A geodatabase can store either raster datasets or mosaic datasets.
Raster dataset
Most image and raster data (such as an orthoimage or DEM) is provided as a raster dataset. The term raster dataset refers to any raster model that is stored on disk or is accessible as a single image stored in cloud storage. A raster dataset is the most basic raster data storage model on which the others are built—for example, mosaic datasets manage raster datasets. It's also the output from many geoprocessing tools that process raster data.
A raster dataset is any valid image or raster format organized into one or more bands. Each band consists of an array of pixels, and each pixel has a value. An image or raster dataset has at least one band. ArcGIS Pro supports more than 70 file formats for raster datasets, including TIFF, JPEG 2000, Cloud Raster Format (CRF), and NITF.
Mosaic dataset
A mosaic dataset is a collection of raster datasets (images) stored as a collection of images and viewed or accessed as a single mosaicked image or individual images (rasters). These collections can be very large in both total file size and number of datasets. The images in a mosaic dataset can remain in their native format on disk or exist in the geodatabase. The metadata can be managed in the image's record as well as attributes in the attribute table. Storing metadata as attributes enables parameters such as sensor orientation data to be managed more easily and allows fast queries to enable selections.
The data in a mosaic dataset does not have to be adjoining or overlapping but can exist as unconnected, discontinuous datasets. For example, you can have images that completely cover an area, or you can have many strips of images that may not join together to form a continuous image (such as along pipelines).
The data can be completely or partially overlapping but captured over multiple dates. The mosaic dataset is an ideal dataset for storing temporal data. You can query the mosaic dataset for the images you need based on time or date and use a mosaic method to display the mosaicked image according to a time or date attribute.
Mosaic datasets are not limited to one particular type of image data. You can add image data from different sensor systems in different projections, resolutions, pixel depths, and numbers of bands. Overviews can be generated for the entire data collection. This allows for faster viewing of the data and allows you to quickly serve these datasets. There are also additional properties for viewing, including setting a mosaicking method, that make these datasets unique and functional in many situations. You can also query a mosaic dataset based on spatial and nonspatial query constraints. The results of that query can be a set of images that you can process one by one, or it could be a dynamically generated mosaicked image.
In addition to image data, you can store and manage lidar data in a mosaic dataset in the same way as image datasets, and even together with image datasets. The lidar data can be stored in the file system as .las files or LAS datasets, or in a geodatabase as a terrain dataset.
Note:
Mosaic datasets are dependent on the version of ArcGIS on which they were built and are compatible across the ArcGIS platform for a particular release cycle. In general, mosaic datasets created with earlier versions of ArcGIS can be read and handled with later versions of ArcGIS. However, a mosaic dataset created with a later version of ArcGIS may not be backward compatible with earlier versions. See the following table for mosaic dataset compatibility:
Raster data storage model comparison
Storing image datasets individually is often the best method when the images are not adjacent to each other or are rarely used in the same project. Mosaicking inputs together to form one large, single extent image data file is appropriate for many applications, but a mosaic dataset may be preferable for the following reasons:
- The extents of the images partially or fully overlap and you want the common areas to be preserved.
- The image datasets represent a collection of observations of the same area at different times in a time series.
- You only want to display the study area, not the entire collection of images.
- You want to manage a collection of images as an integrated set but keep their individual states.
- You want to record and manage additional attribute columns that describe each image.
Compare the raster data storage models
Raster dataset | Mosaic dataset | |
---|---|---|
Description | A single picture of an object or a seamless image covering a spatially continuous area. This may be a single original image or the result of many merged (mosaicked) images. | A group of image datasets stored as a collection that allows you to store, manage, view, and query collections of image and lidar data. It is viewed as a mosaicked image, but you have access to each dataset as an item in the collection. |
Storage | As a file on disk or in a geodatabase. | In a geodatabase, but the source data, stored as a file on disk, can be referenced. |
Homogeneous or heterogeneous data | Homogeneous data: a single format, data type, and file. | Heterogeneous data: multiple formats, data types, file sizes, and coordinate systems. |
Metadata | Stored once and applies to a complete dataset. | Can be stored in the raster record and as attributes in the attribute table. |
Downsampled datasets | Pyramids of the entire image dataset. | Pyramids for each image dataset, as well as overviews for the entire collection. |
Geoprocessing and image analysis | As a data source in many geoprocessing and analysis tools, including raster functions and RFTs. | As a data source in many geoprocessing and analysis tools, including raster functions and RFTs. |
Pros |
|
|
Cons |
|
|
Serving | Served directly as an image service. | Served directly as an image service. |
Recommendations | Use raster datasets when overlaps between mosaicked images do not need to be retained and for fast display of large quantities of image data. | Use a mosaic dataset for managing and visualizing image and lidar data. It's well suited for multidimensional data, querying, storing metadata, and overlapping data, and it provides a hybrid solution. |
Raster data storage in the geodatabase
Store raster data in the geodatabase when you want to manage images, add behavior, and control the schema; when you want to manage a well-defined set of raster datasets as part of a database management system (DBMS); and when you require a single architecture for managing all content. There are two primary types of geodatabases: enterprise and file.
The enterprise geodatabase can support multiple operations in its DBMS. File geodatabases are designed to be edited by a single user and do not support versioning. They reside in your file system directory, so they do not require a password for access. File geodatabases and enterprise geodatabases share the same basic storage schema.
Note:
The functional behavior of each geodatabase is basically the same; however, there are some exceptions for specific tools or procedures. For information about the differences in behavior by a tool or procedure, refer to the specific tool or procedure in the reference help.
Compare raster storage in file and enterprise geodatabases
Raster storage characteristic | File geodatabase | Enterprise geodatabase |
---|---|---|
Size limit | 1 terabyte (TB) for each raster dataset |
Unlimited; limit dependent on DBMS limits |
Raster dataset file format | File geodatabase raster dataset |
Enterprise geodatabase raster dataset |
Storage |
|
|
Stored in the file system |
Stored in an RDBMS | |
Compression | LZ77, JPEG, JPEG 2000, or none |
LZ77, JPEG, JPEG 2000, or none |
Pyramids | Supports partial pyramids |
Supports partial pyramids |
Mosaicking | Allows you to append to a raster dataset when mosaicking |
Allows you to append to a raster dataset when mosaicking |
Updating | Allows incremental updating |
Allows incremental updating |
Number of users | Single user and small workgroups; some readers and one writer |
Multiuser; many users and many writers |
Raster block table in the geodatabase
Raster data is typically much larger in size than features and requires a side table for storage. For example, a typical panchromatic orthoimage can have 20,000 rows by 20,000 columns (400 million pixel values), or more.
To optimize performance with larger raster datasets, a geodatabase raster is divided into smaller tiles (referred to as blocks) with a typical size of approximately 128 rows by 128 columns or 256 rows by 256 columns. These smaller blocks are then held in a side table for each raster. Each separate tile is held in a separate row in a block table, as shown below.
This structure means that only the blocks for an extent are retrieved when they are needed, instead of the entire image. In addition, resampled blocks that are used to build raster pyramids can be stored and managed in the same block table as additional rows.
This enables images of enormous sizes to be managed in a DBMS that produce very fast performance. A DBMS also provides secure multiuser access.
File geodatabase
The storage model of the file geodatabase is similar to that of an enterprise geodatabase, which stores data in blocks.
This provides more efficient access to data—especially during the mosaic operation. When mosaicking data in a file geodatabase, only overlapping blocks are updated. If an overlapping block does not exist, a new block is inserted. Partial blocks are padded with NoData pixels. In addition, the file geodatabase (and enterprise) storage model uses partial pyramid updates, which saves time. The data structures of file and enterprise geodatabases are the same—fast copy technology is used to copy and paste data between them.
File geodatabases accept configuration keywords, but unlike enterprise geodatabases, the configuration keywords have a standard predefined value. For more information about configuration keywords, see Configuration keywords for file geodatabases.
Enterprise geodatabase
When raster data is stored in an enterprise geodatabase, it offers an enterprise level of functionality, such as security, multiuser access, and data sharing. There are three main reasons to store raster data as an enterprise geodatabase:
- It will not be updated as regularly (such as every two or three years or longer).
- It will be accessed in read-only use cases (such as using it as basemap data under vector data).
- Hundreds of users (or more) will access it as a basemap.
Because of its storage structure, the raster data is considered to be managed, or fully controlled, by the geodatabase. Enterprise geodatabases store all the raster information (pixels, spatial reference, any associated table, and other metadata) for raster datasets and raster attributes in the associated relational database. This means that all input raster information is loaded into the database and can be thought of as a format conversion.
The enterprise geodatabase evenly tiles the bands into blocks of pixels according to a user-defined dimension (the default is 128 by 128). Tiling the image band data enables efficient storage and retrieval of the raster data. The pyramid information is stored according to a declining resolution. The height of the pyramid is determined by the number of levels specified by the application or user.
The raster blocks table (the largest table and the one that stores the pixel information and pyramids) stores one row per block (tile) per band in a raster dataset and per pyramid level. For example, a three-band image divided into 12 blocks with no pyramids built will have 36 rows in the BLK table—12 separate blocks for each of the bands. The column containing the pixel data for the block is a binary large object (BLOB).
Compression, pyramids, and tile size
There are other storage structures to consider when storing and managing raster data, including compression, downsampled datasets (pyramids and overviews), and tile size.
Compression
There are two types of compression: lossless and lossy. Lossless compression means the values of pixels in the raster dataset are not changed, whereas lossy compression results in altered pixel values. The amount of compression depends on the type of pixel data; the more homogeneous the image, the higher the compression ratio. You should store data that will be used for analysis, not just display, using a lossless compression. The primary benefit of compressing data is that it requires less storage space; the amount of savings depends on the method of compression and the redundancy in the data. An added benefit is the significantly improved performance because you are transferring fewer packets of data. For example, when accessing image data over a network with low bandwidth, the use of compression can offer improved performance because the amount of information to be transferred is reduced, making it possible to store large, seamless image datasets and serve them quickly to a client for display.
Mosaic datasets also have compression. This is not for storage of the image dataset being managed; it is for the compression applied to the image it generates when displayed. This also improves accessing data over the network by reducing the size of the file that is transferred. For more details about the Allowed Compression Method property, see Mosaic dataset properties.
Downsampled datasets for fast display
Downsampled datasets are rasters created from the original data for either raster datasets or mosaic datasets. They are generated to improve display speed and performance. When they are created for raster datasets, they are called pyramids, and when they are created for mosaic datasets, they are called overviews.
Pyramids versus overviews
Pyramids | Overviews | |
---|---|---|
Created for | Raster datasets | Mosaic datasets |
Format | Writes .ovr files—with a few exceptions. Reads pyramids stored externally as .ovr or .rrd files or internally. | Writes as .tif files. |
Storage | In a single file that generally resides next to the source raster dataset using the same name. | By default, in a folder next to the geodatabase with the *.overviews extension, or internally for enterprise geodatabases. Storage location is customizable. |
Storage size | 2 to 10 percent (compared to original raster datasets). | |
Downsampling factor | 2 | Default is 3. This can be edited. |
Extent |
|
|
Options when building |
|
|
Tile size
In an enterprise geodatabase or a file geodatabase, raster data is stored in a structure in which the data is tiled, indexed, pyramided, and most often compressed. Because of tiling, indexing, and pyramiding, each time the raster data is queried, only the tiles that are necessary to satisfy the extent and resolution of the query are returned instead of the whole dataset. The tile size controls the number of pixels you want to store in each database memory block. This is specified as a number of pixels in x and y. The default tile size is 128 by 128 pixels, and most applications do not require changing these default values. In an enterprise geodatabase, the tiles of raster data are compressed before they are stored in the geodatabase.