Available with Data Reviewer license.
One of the challenges in implementing data quality control processes is the identification of technical data quality requirements for the organization. It is important to identify and understand the business requirements for your data before translating those into technical requirements that define good-quality data.
An effective data quality control process is based on the understanding of how data and information products are used within and outside the organization. Each organization defines quality differently and bases this definition on the intended purpose and use of the data. The following diagram illustrates a variety of sources for quality requirements that may be applicable to your organization.
Data quality elements
Data quality elements describe a certain aspect required for a dataset to be used and accurate. GIS data has different components to its quality. As defined by the International Organization for Standardization (ISO), these components include the following:
- Completeness
- Logical consistency
- Spatial accuracy
- Thematic accuracy
- Temporal quality
- Data usability
Completeness
The presence or absence of features, their attributes, and relationships in a data model.
Logical consistency
A degree of adherence to preestablished rules of a data model's structure, attribution, and relationships as defined by an organization or industry. Many industries follow standards that are reflected in a geospatial data model as value domains, data formats, and topological consistency of how the data is being stored.
Spatial accuracy
The accuracy of the position of features in relation to Earth.
Thematic accuracy
The accuracy of attributes within features and their appropriate relationships.
Temporal quality
The quality of temporal attributes and temporal relationship of features.
Data usability
Adherence of a dataset to a specific set of requirements related to a use-case.
Quality requirement documentation
A quality assurance (QA) plan is a document that identifies the quality standards that are relevant to a project and methods to achieve them. A QA plan is a living document that will change as new quality requirements are identified by the organization and also serves as an opportunity to bring together key stakeholders to build a common picture of what constitutes good-quality data and the business processes that drive those requirements.
The following are documentation techniques and standards that can be useful when identifying data quality requirements:
- ISO/TC 211 Geographic information/Geomatics—International Organization for Standardization (ISO) series of standards for geographic information to define methods, tools, and services for data management for acquiring, processing, analyzing, accessing, presenting, and transferring such data in digital form among users, systems, and locations.
- Requirements Traceability Matrix—A document created to manage and track business requirements to ensure they are met during a project implementation. This document correlates business requirements collected for the project and capabilities of a software product.
The Requirement Category column in the following table illustrates an example of collected requirements that reference some of the data quality elements outlined above. The next step after organizing and categorizing your requirements will be to correlate data quality requirements to corresponding capabilities found in ArcGIS.
ID | Requirement | Requirement number | Requirement category | Product capability |
---|---|---|---|---|
1 | Ability to run queries based on number of segments edited by an individual user | F001 | Functional Requirement | |
2 | Ability to ensure the production data model is compliant with industry schema standard | D001 | Data Requirement—Logical consistency | |
3 | As geodatabase administrator, ability to restrict POST privileges to the DEFAULT version of a small set of admin users | F002 | Functional Requirement | |
4 | Ability to produce ad hoc reports indicating gaps in data for any attributes selected | F003 | Functional Requirement | |
5 | Ability to ensure that source data will be migrated into the production database and have appropriate domains and relationships | D002 | Data Requirement—Logical consistency | |
6 | Ability to ensure that source data is accurate according to the defined standards | D003 | Data Requirement—Spatial accuracy | |
7 | Ability to ensure that production data is for mobile collectors and is attribute accurate | D004 | Data Requirement—Thematic accuracy | |
8 | Ability to ensure that there is no overlap between event measures during the project period of 2010–2020 | D005 | Data Requirement—Temporal quality | |
9 | Ability to hyperlink a validation error with a violated business rule and provide a description | F004 | Functional Requirement | |
10 | Ability to identify the number of cells that are not populated (NULL) for each required attribute field | D006 | Data Requirement—Thematic accuracy | |
11 | Ability to identify parcels that have no overlaying building footprint features | D007 | Data Requirement—Logical consistency | |
12 | Ability to create error reports, generate Excel files, and save them to a local drive | F005 | Functional Requirement | |
13 | Ability to validate a unique ID attribute linking a parcel to matching building footprint features | D008 | Data Requirement—Logical consistency | |
14 | Ability to confirm all features are compliant with metadata standards | D009 | Data Requirement—Data completeness | |
15 | Ability to identify existing features as an error | F006 | Data Requirement—Thematic accuracy | |
16 | Ability to indicate the location of missing features as an error | F007 | Data Requirement—Data completeness |