Identify data quality requirements

Available with Data Reviewer license.

One of the challenges in implementing data quality control processes is the identification of technical data quality requirements for the organization. It is important to identify and understand the business requirements for your data before translating those into technical data quality requirements that define good-quality data.

An effective data quality control process is based on the understanding of how data and information products are used within and outside the organization. Each organization defines good-quality data differently and bases this definition on the intended purpose and use of the data. The following diagram illustrates a variety of sources for data quality requirements that may be applicable to your organization.

Sources and data quality requirements

Data quality elements

Data quality elements describe a certain aspect required for a dataset to be used and accurate. GIS data has different components to its quality. As defined by the International Organization for Standardization (ISO), these components include the following:

  • Completeness
  • Logical consistency
  • Spatial accuracy
  • Thematic accuracy
  • Temporal quality
  • Data usability

Completeness

The presence or absence of features, their attributes, and relationships in a data model.

Neighborhood with missing building footprint
A neighborhood with missing building footprint.

Logical consistency

A degree of adherence to preestablished rules of a data model's structure, attribution, and relationships as defined by an organization or industry. Many industries follow standards that are reflected in a geospatial data model as value domains, data formats, and topological consistency of how the data is being stored.

Highway with road surface-type gravel
A highway with road surface-type gravel.

Spatial accuracy

The accuracy of the position of features in relation to Earth.

Lake feature has been shifted.
A lake feature that has been shifted.

Thematic accuracy

The accuracy of attributes within features and their appropriate relationships.

Swimming pool captured as wetland
A swimming pool captured as wetland.

Temporal quality

The quality of temporal attributes and temporal relationship of features.

Outdated chart with open runway
An outdated chart with open runway.
Updated chart with closed runway
An updated chart with closed runway.

Data usability

Adherence of a dataset to a specific set of requirements related to a use-case.

Used to route emergency vehicles
This is used to route emergency vehicles.
Used to map national parks
This is used to map national parks.

Quality requirement documentation

A quality assurance (QA) plan is a document that identifies the quality standards that are relevant to a project and methods to achieve them. A QA plan is a living document that will change as new quality requirements are identified by the organization and also serves as an opportunity to bring together key stakeholders to build a common picture of what constitutes good-quality data and the business processes that drive those requirements.

The following are documentation techniques and standards that can be useful when identifying data quality requirements:

  • ISO/TC 211 Geographic information/Geomatics—International Organization for Standardization (ISO) series of standards for geographic information to define methods, tools, and services for data management for acquiring, processing, analyzing, accessing, presenting, and transferring such data in digital form among users, systems, and locations.
  • Requirements Traceability Matrix—A document created to manage and track business requirements to ensure they are met during a project implementation. This document correlates business requirements collected for the project and capabilities of a software product.

The Requirement Category column in the following table illustrates an example of collected requirements that reference some of the data quality elements outlined above. The next step after organizing and categorizing your requirements will be to correlate data quality requirements to corresponding capabilities found in ArcGIS.

IDRequirementRequirement numberRequirement categoryProduct capability

1

Ability to run queries based on number of segments edited by an individual user

F001

Functional Requirement

2

Ability to ensure the production data model is compliant with industry schema standard

D001

Data Requirement—Logical consistency

3

As geodatabase administrator, ability to restrict POST privileges to the DEFAULT version of a small set of admin users

F002

Functional Requirement

4

Ability to produce ad hoc reports indicating gaps in data for any attributes selected

F003

Functional Requirement

5

Ability to ensure that source data will be migrated into the production database and have appropriate domains and relationships

D002

Data Requirement—Logical consistency

6

Ability to ensure that source data is accurate according to the defined standards

D003

Data Requirement—Spatial accuracy

7

Ability to ensure that production data is for mobile collectors and is attribute accurate

D004

Data Requirement—Thematic accuracy

8

Ability to ensure that there is no overlap between event measures during the project period of 2010–2020

D005

Data Requirement—Temporal quality

9

Ability to hyperlink a validation error with a violated business rule and provide a description

F004

Functional Requirement

10

Ability to identify the number of cells that are not populated (NULL) for each required attribute field

D006

Data Requirement—Thematic accuracy

11

Ability to identify parcels that have no overlaying building footprint features

D007

Data Requirement—Logical consistency

12

Ability to create error reports, generate Excel files, and save them to a local drive

F005

Functional Requirement

13

Ability to validate a unique ID attribute linking a parcel to matching building footprint features

D008

Data Requirement—Logical consistency

14

Ability to confirm all features are compliant with metadata standards

D009

Data Requirement—Data completeness

15

Ability to identify existing features as an error

F006

Data Requirement—Thematic accuracy

16

Ability to indicate the location of missing features as an error

F007

Data Requirement—Data completeness

Sample Requirements Traceability Matrix

Related topics