Prepare data

You can use data engineering tools to clean and prepare your data. A subset of geoprocessing tools is available in the Data Engineering view to help you prepare your data for use in a map or an analysis. These tools are grouped into the following categories:

  • Clean—Clean the data. For example, you can remove unnecessary rows or fields. You can also modify the fields or fill missing values.
  • Construct—Create fields that are derived from existing fields or properties of the layer. For example, you can add and calculate a new field; standardize, transform, or reclassify an existing field; and add a field based on the input layer’s geometry.
  • Integrate—Integrate or add data from another data source to the input table or feature class. For example, you can join fields or add fields by enriching the data.
  • Format—Change the format of the fields or reorganize the fields in the table or feature class. For example, you can convert time fields, encode categorical fields, or reduce the dimensions of existing fields.

Note:

Some geoprocessing tools in the Data Engineering view are not available for a noneditable layer. In this case, make an editable copy of the layer and open a new Data Engineering view.

You can access these groups and tools in the Data Engineering view by doing one of the following:

Data Engineering ribbon

When the Data Engineering view is active, a contextual ribbon appears at the top of the application. The ribbon provides access to commands and tools for exploring and preparing data.

Data Engineering ribbon

The Data group on the ribbon provides access to the fields view and attribute table of the layer associated with the active Data Engineering view. The Tools group offers four tool galleries: Clean, Construct, Integrate, and Format. Each tool gallery contains a subset of geoprocessing tools for the respective data engineering task. By default, the layer associated with the active Data Engineering view is used to automatically populate the input features parameter of these tools.

Data Engineering tools

The following tables describe all of the tools on the Data Engineering ribbon.

Note:

Some of the geoprocessing tools are not available for nonspatial data such as stand-alone tables.

Clean

The following tools are available in the Clean category:

ToolDescription

Fill Missing Values

Replaces missing (null) values with estimated values based on spatial neighbors, space-time neighbors, or time-series values.

Delete Field

Deletes one or more fields from a table, feature class, feature layer, or raster dataset.

Spatial Outlier Detection

Identifies spatial outliers in point features by calculating the local outlier factor (LOF) of each feature. Spatial outliers are features in locations that are abnormally isolated, and the LOF is a measurement that describes how isolated a location is from its local neighbors.

Project

Projects spatial data from one coordinate system to another.

Construct

The following tools are available in the Construct category:

ToolDescription

Calculate Geometry Attributes

Adds information to a feature's attribute fields representing the spatial or geometric characteristics and location of each feature, such as length or area and x-, y-, z-, and m-coordinates.

Calculate Field

Calculates the values of a field for a feature class, feature layer, or raster.

Transform Field

Transforms continuous values in one or more fields by applying mathematical functions to each value and changing the shape of the distribution. The transformation methods in the tool include log, square root, Box-Cox, multiplicative inverse, square, exponential, and inverse Box-Cox.

Standardize Field

Standardizes values in fields by converting them to values that follow a specified scale. Standardization methods include z-score, minimum-maximum, absolute maximum, and robust standardization.

Add Field

Adds a new field to a table or the table of a feature class or feature layer, as well as to rasters with attribute tables.

Dimension Reduction

Reduces the number of dimensions of a set of continuous variables by aggregating the highest possible amount of variance into fewer components using Principal Component Analysis (PCA) or Reduced-Rank Linear Discriminant Analysis (LDA).

Time Series Smoothing

Smooths time series data, which helps account for short-term fluctuations to expose long-term trends and cycles. The tool can use the numeric variable of one or more time series using centered, forward, and backward moving averages, as well as an adaptive method based on local linear regression.

Integrate

The following tools are available in the Integrate category:

ToolDescription

Join Field

Joins the contents of a table to another table based on a common attribute field. The input table is updated to contain the fields from the join table. You can select which fields from the join table will be added to the input table.

Enrich

Enriches data by adding demographic and landscape facts about the people and places that surround or are inside data locations. The output is a duplicate of the input with additional attribute fields. This tool requires an ArcGIS Online organizational account or a locally installed Business Analyst dataset.

Near

Calculates distance and additional proximity information between the input features and the closest feature in another layer or feature class.

Spatial Join

Joins attributes from one feature to another based on the spatial relationship. The target features and the joined attributes from the join features are written to the output feature class.

Summarize Within

Overlays a polygon layer with another layer to summarize the number of points, length of the lines, or area of the polygons within each polygon, and calculate attribute field statistics about those features within the polygons.

Sample

Creates a table or a point feature class that shows the values of cells from a raster, or a set of rasters, for defined locations. The locations are defined by raster cells, points, polylines, or polygons.

Summarize Nearby

Finds features that are within a specified distance of features in the input layer and calculates statistics for the nearby features.

Apportion Polygon

Summarizes the attributes of an input polygon layer based on the spatial overlay of a target polygon layer and assigns the summarized attributes to the target polygons. The target polygons have summed numeric attributes that are derived from the input polygons that each target overlaps.

Format

The following tools are available in the Format category:

ToolDescription

Encode Field

Converts categorical values (string, integer, or date) into multiple numerical fields, each representing a category. The encoded numerical fields can be used in most data science and statistical workflows including regression models.

Convert Time Field

Converts time values stored in a string or numeric field to a date field. The tool can also be used to convert time values stored in string, numeric, or date fields into custom formats such as day of the week and month of the year.

Transpose Fields

Switch data stored in fields or columns to rows in a new table or feature class.

Convert Time Zone

Converts time values recorded in a date field from one time zone to another time zone.

Reclassify Field

Reclassifies values in a numerical or text field into classes based on bounds defined manually or using a reclassification method.

Pivot Table

Creates a table from the input table by reducing redundancy in records and flattening one-to-many relationships.

Note:

Most geoprocessing operations that modify the input data cannot be undone.

Related topics