You can use data engineering tools to clean and prepare your data. A subset of geoprocessing tools is available in the Data Engineering view to help you prepare your data for use in a map or an analysis. These tools are grouped into the following categories:
- Clean—Clean the data. For example, you can remove unnecessary rows or fields. You can also modify the fields or fill missing values.
- Construct—Create fields that are derived from existing fields or properties of the layer. For example, you can add and calculate a new field; standardize, transform, or reclassify an existing field; and add a field based on the input layer’s geometry.
- Integrate—Integrate or add data from another data source to the input table or feature class. For example, you can join fields or add fields by enriching the data.
- Format—Change the format of the fields or reorganize the fields in the table or feature class. For example, you can convert time fields, encode categorical fields, or reduce the dimensions of existing fields.
Some geoprocessing tools in the Data Engineering view are not available for a noneditable layer. In this case, make an editable copy of the layer and open a new Data Engineering view.
You can access these groups and tools in the Data Engineering view by doing one of the following:
- Right-click a context menu of a field in the fields panel.
- Right-click a context menu of a field in the statistics panel.
- Click the tool on the Data Engineering ribbon.
Data Engineering ribbon
When the Data Engineering view is active, a contextual ribbon appears at the top of the application. The ribbon provides access to commands and tools for exploring and preparing data.
The Data group on the ribbon provides access to the fields view and attribute table of the layer associated with the active Data Engineering view. The Tools group offers four tool galleries: Clean, Construct, Integrate, and Format. Each tool gallery contains a subset of geoprocessing tools for the respective data engineering task. By default, the layer associated with the active Data Engineering view is used to automatically populate the input features parameter of these tools.
Data Engineering tools
The following tables describe all of the tools on the Data Engineering ribbon.
Some of the geoprocessing tools are not available for nonspatial data such as stand-alone tables.
The following tools are available in the Clean category:
Replaces missing (null) values with estimated values based on spatial neighbors, space-time neighbors, or time-series values.
Deletes one or more fields from a table, feature class, feature layer, or raster dataset.
Identifies spatial outliers in point features by calculating the local outlier factor (LOF) of each feature. Spatial outliers are features in locations that are abnormally isolated, and the LOF is a measurement that describes how isolated a location is from its local neighbors.
Projects spatial data from one coordinate system to another.
The following tools are available in the Construct category:
Adds information to a feature's attribute fields representing the spatial or geometric characteristics and location of each feature, such as length or area and x-, y-, z-, and m-coordinates.
Calculates the values of a field for a feature class, feature layer, or raster.
Transforms continuous values in one or more fields by applying mathematical functions to each value and changing the shape of the distribution. The transformation methods in the tool include log, square root, Box-Cox, multiplicative inverse, square, exponential, and inverse Box-Cox.
Standardizes values in fields by converting them to values that follow a specified scale. Standardization methods include z-score, minimum-maximum, absolute maximum, and robust standardization.
Adds a new field to a table or the table of a feature class or feature layer, as well as to rasters with attribute tables.
Reduces the number of dimensions of a set of continuous variables by aggregating the highest possible amount of variance into fewer components using Principal Component Analysis (PCA) or Reduced-Rank Linear Discriminant Analysis (LDA).
Smooths time series data, which helps account for short-term fluctuations to expose long-term trends and cycles. The tool can use the numeric variable of one or more time series using centered, forward, and backward moving averages, as well as an adaptive method based on local linear regression.
The following tools are available in the Integrate category:
Joins the contents of a table to another table based on a common attribute field. The input table is updated to contain the fields from the join table. You can select which fields from the join table will be added to the input table.
Enriches data by adding demographic and landscape facts about the people and places that surround or are inside data locations. The output is a duplicate of the input with additional attribute fields. This tool requires an ArcGIS Online organizational account or a locally installed Business Analyst dataset.
Calculates distance and additional proximity information between the input features and the closest feature in another layer or feature class.
Joins attributes from one feature to another based on the spatial relationship. The target features and the joined attributes from the join features are written to the output feature class.
Overlays a polygon layer with another layer to summarize the number of points, length of the lines, or area of the polygons within each polygon, and calculate attribute field statistics about those features within the polygons.
Creates a table or a point feature class that shows the values of cells from a raster, or a set of rasters, for defined locations. The locations are defined by raster cells, points, polylines, or polygons.
Finds features that are within a specified distance of features in the input layer and calculates statistics for the nearby features.
Summarizes the attributes of an input polygon layer based on the spatial overlay of a target polygon layer and assigns the summarized attributes to the target polygons. The target polygons have summed numeric attributes that are derived from the input polygons that each target overlaps.
The following tools are available in the Format category:
Converts categorical values (string, integer, or date) into multiple numerical fields, each representing a category. The encoded numerical fields can be used in most data science and statistical workflows including regression models.
Converts time values stored in a string or numeric field to a date field. The tool can also be used to convert time values stored in string, numeric, or date fields into custom formats such as day of the week and month of the year.
Switch data stored in fields or columns to rows in a new table or feature class.
Converts time values recorded in a date field from one time zone to another time zone.
Reclassifies values in a numerical or text field into classes based on bounds defined manually or using a reclassification method.
Creates a table from the input table by reducing redundancy in records and flattening one-to-many relationships.
Most geoprocessing operations that modify the input data cannot be undone.