Skip To Content

Data quality and validation

Available with Data Reviewer license.

Data used in your organization for visualization, analysis, compilation, and sharing should meet a defined standard for quality. Data quality requirements vary from project to project and organization to organization. Requirements for how accurate or complete a dataset needs to be are based on how the data will be used. These requirements are driven by a combination of technical, product, and client requirements. ArcGIS Pro provides data quality capabilities that enable workflows that include quality assurance and quality control (QA/QC) of data and management of errors through a defined life cycle process.

Validate your data

Data validation is an iterative process that uses formal methods of evaluating a dataset's adherence to a defined quality standard. ArcGIS Pro provides tools for automating and simplifying data quality assurance (QA) and quality control (QC) through automated and semi-automated workflows. These tools can help detect anomalies with features, attributes, and spatial relationships in your data.

Automated review

Automated review is the ability to evaluate a feature's quality without human intervention. This includes capabilities such as Reviewer checks that assess data integrity by performing spatial, geometric, and attribute validation. Checks are configured to validate data based on specific conditions. Some checks search for conditions, such as polygon slivers or cutbacks, while others search for features with specific spatial relationships. For more information, see Reviewer rule design.

Semi-automated review

Semi-automated review consists of guided workflows that require some form of human interaction and input. Examples include visual review workflows that are employed to discover missing, misplaced, or miscoded features. Data Reviewer provides a series of simple-to-use tools to streamline workflows required to detect errors that cannot be found using automated methods. For more information, see Identify errors on existing features and Identify missing features.

Compile the results

Errors detected during data validation are organized into sessions. Sessions represent validation and quality control transactions performed by data checks or manual review. Sessions can be identified by user-defined names and are stored in a file or enterprise geodatabase.

Error results

Error results represent an object in your GIS, such as a feature, table row, or metadata element, that has been marked as an anomaly by automated validation (using data checks) or manual inspection. Error results include information about the source of the error, detection method, date and time, severity, and life cycle phase. Error results can have a geometry that identifies a feature (or a portion of a feature) that has been classified as an error by a data check. This geometry allows you to quickly navigate to the specific area in which the error was detected. Errors found in stand-alone tables will create errors without geometries.

An error's severity rating indicates its relative importance compared to other errors. Severity is set within data checks or tools and is automatically applied to errors created during the data review process. Severity is represented on a scale of values, from 1 (highest) to 5 (lowest). For example, when using an automated check to detect high-priority errors, such as buildings that overlap lake features, you can assign the check a severity rating of 1. Any error result created by this check will have a severity of 1.

Track life cycle status

Errors detected during the data review process are tracked through a defined life cycle process. This process includes three life cycle phases: Review, Correction, and Verification.

Reviewer life cycle statuses

Life cycle phases describe the who, what, and when in the process of correcting and verifying an error. Each phase contains one or more status values that describe the actions taken as the error progresses from one phase to another. For more information, see Results and life cycle phases.