You can use data engineering tools to clean and prepare your data. A subset of geoprocessing tools is available in the Data Engineering view to help you prepare your data for use in a map or an analysis. These tools are grouped into the following categories:
- Clean—Clean the data. For example, you can remove unnecessary fields. You can also modify the fields or fill missing values.
- Construct—Create fields that are derived from existing fields or properties of the layer. For example, you can add and calculate a new field; standardize, transform, or reclassify an existing field; and add a field based on the input layer’s geometry.
- Integrate—Integrate or add data from another data source to the input table or feature class. For example, you can join fields or add fields by enriching the data.
- Format—Change the format of the fields or reorganize the fields in the table or feature class. For example, you can convert time fields, encode categorical fields, or reduce the dimensions of existing fields.
Some geoprocessing tools in the Data Engineering view are not available for a noneditable layer. In this case, make an editable copy of the layer and open a new Data Engineering view.
You can access these groups and tools in the Data Engineering view by doing one of the following:
- Right-click a context menu of a field in the fields panel.
- Right-click a context menu of a field in the statistics panel.
- Click the tool on the Data Engineering ribbon.
Data Engineering ribbon
When the Data Engineering view is active, a contextual ribbon appears at the top of the application. The ribbon provides access to commands and tools for exploring and preparing data.
The Data group on the ribbon provides access to the fields view and attribute table of the layer associated with the active Data Engineering view. The Tools group offers four tool galleries: Clean, Construct, Integrate, and Format. Each tool gallery contains a subset of geoprocessing tools for the respective data engineering task. By default, the layer associated with the active Data Engineering view is used to automatically populate the input features parameter of these tools. In the Spatial group, Display XY Data and Geocode Table convert your non-spatial standalone tables to spatial data.
Data Engineering tools
The following tables describe all of the tools on the Data Engineering ribbon.
Some of the geoprocessing tools are not available for nonspatial data such as stand-alone tables.
The following tools are available in the Clean category:
Deletes one or more fields from a table, feature class, feature layer, or raster dataset.
Appends to, or optionally updates, an existing target dataset with multiple input datasets. Input datasets can be feature classes, tables, shapefiles, rasters, or annotation or dimension feature classes.
Renames fields and field aliases or alters field properties.
Projects spatial data from one coordinate system to another.
Deletes all or the selected subset of rows from the input.
Replaces missing (null) values with estimated values based on spatial neighbors, space-time neighbors, time-series, or global statistic values.
Identifies global or local spatial outliers in point features.
The following tools are available in the Construct category:
Calculates the values of a field for a feature class, feature layer, or raster.
Adds a new field to a table or the table of a feature class or feature layer, as well as to rasters with attribute tables.
Adds information to a feature's attribute fields representing the spatial or geometric characteristics and location of each feature, such as length or area and x-, y-, z-coordinates, and m-values.
Transforms continuous values in one or more fields by applying mathematical functions to each value and changing the shape of the distribution. The transformation methods in the tool include log, square root, Box-Cox, multiplicative inverse, square, exponential, and inverse Box-Cox.
Standardizes values in fields by converting them to values that follow a specified scale. Standardization methods include z-score, minimum-maximum, absolute maximum, and robust standardization.
Reduces the number of dimensions of a set of continuous variables by aggregating the highest possible amount of variance into fewer components using Principal Component Analysis (PCA) or Reduced-Rank Linear Discriminant Analysis (LDA).
Smooths time series data, which helps account for short-term fluctuations to expose long-term trends and cycles. The tool can use the numeric variable of one or more time series using centered, forward, and backward moving averages, as well as an adaptive method based on local linear regression.
The following tools are available in the Integrate category:
Joins attributes from one feature to another based on the spatial relationship. The target features and the joined attributes from the join features are written to the output feature class.
Joins the contents of a table to another table based on a common attribute field. The input table is updated to contain the fields from the join table. You can select which fields from the join table will be added to the input table.
Calculates distance and additional proximity information between the input features and the closest feature in another layer or feature class.
Overlays a polygon layer with another layer to summarize the number of points, length of the lines, or area of the polygons within each polygon, and calculate attribute field statistics about the features within the polygons.
Finds features that are within a specified distance of features in the input layer and calculates statistics for the nearby features.
Creates a table or a point feature class that shows the values of cells from a raster, or a set of rasters, for defined locations. The locations are defined by raster cells, points, polylines, or polygons.
Enriches data by adding demographic and landscape facts about the people and places that surround or are inside data locations. The output is a duplicate of the input with additional attribute fields. This tool requires an ArcGIS Online organizational account or a locally installed Business Analyst dataset.
Summarizes the attributes of an input polygon layer based on the spatial overlay of a target polygon layer and assigns the summarized attributes to the target polygons. The target polygons have summed numeric attributes that are derived from the input polygons that each target overlaps.
The following tools are available in the Format category:
Transfers temporal values stored in a field to another field. The tool can be used to convert between field types (text, numeric, or datetime fields) or to convert the values to a different format such as dd/MM/yy HH:mm:ss to yyyy-MM-dd.
Converts time values recorded in a date field from one time zone to another time zone.
Creates a table from the input table by reducing redundancy in records and flattening one-to-many relationships.
Switch data stored in fields or columns to rows in a new table or feature class.
Reclassifies values in a numerical or text field into classes based on bounds defined manually or using a reclassification method.
Converts categorical values (string, integer, or date) into multiple numerical fields, each representing a category. The encoded numerical fields can be used in most data science and statistical workflows including regression models.
Most geoprocessing operations that modify the input data cannot be undone.