Fill Missing Values (Space Time Pattern Mining)

Summary

Replaces missing (null) values with estimated values based on spatial neighbors, space-time neighbors, or time-series values.

Learn more about how Fill Missing Values works

Illustration

Fill Missing Values tool example

Usage

  • The Input Features value can be points or polygons.

  • The resulting output will contain three fields for each field of the Fields to Fill parameter. The first will contain both the original and filled values, and the second will contain an indicator that the value was estimated. The estimated field will keep its original field name, but field aliases will be created using the following naming convention: <field>_FILLED and <field>_ESTIMATED. The third field is the number of neighbors field, <field>_ N_NEIGHBORS, used in the calculations for each estimated value.

  • The output will also contain fields containing values that can help you understand the number of neighbors and range of neighbor values used in the calculations for the target missing value. If the Fill Method parameter is set to the Average option, the standard deviation (<field>_STD) of the neighbors used in calculations is reported. The maximum neighbor value is reported for the Minimum option and the minimum neighbor value for the Maximum option. If the Fill Method parameter is set to the Median option, the mean absolute deviation of the neighbors is reported. If missing values are being filled using the Temporal Trend option, the field will contain the sum of squared residuals of the spline. The NNBRS field will contain the count of neighbors used to calculate the estimated values.

  • You can include fields that do not contain null values. These fields will be copied to the output but will have no additional fields associated with them in the output (such as <field>_FILLED or <field>_ESTIMATED). Alternatively, you can provide a value for the Unique ID parameter that will be added to the output and can be used to join your results back to the Input Feature Class value.

  • The field NUM_EST (TOT_EST if you're using a related table) is the total number of estimated variables for the associated record. This field is used to render the output map.

  • You can append the additional fields to the input feature class using the Append Fields To Input Features parameter. If you append these fields, a related table cannot be provided.

  • This tool can be used with panel data stored either as repeated shapes or with a related table. If a value is specified for the Location ID parameter, the tool will recognize that the input is panel data and the Time Field parameter will be required.

  • The Location ID value is an integer field and should represent a unique and stationary location. It should not have different X,Y coordinates over time.

  • If the Fixed distance, Contiguity edges only, or Contiguity edges corners option is chosen for the Conceptualization of Spatial Relationships parameter, a space time window can be simulated by choosing a value for the Distance Band and Temporal Neighborhood parameters.

  • If the Fixed distance, Contiguity edges only, or Contiguity edges corners option is chosen for the Conceptualization of Spatial Relationships parameter, a Number of Spatial Neighbors parameter value can be set to specify a minimum number of neighbors.

  • The Temporal Trend option for the Fill Method parameter is only available if values have been set for the Location ID and Time Field parameters.

  • When using the Temporal Trend option to fill values, the location with a null value being filled must have at least two time periods with values in the beginning and at least two time periods with values at the end of the time series in order to be filled. Because of this requirement, nulls existing in the first two or last two time steps will always be unable to fill using the Temporal Trend option .

  • TheTemporal Trend option uses the Interpolated Univariate Spline method in the SciPy Interpolation package.

  • Missing values that could not be estimated and filled will be reported in the output in the format in which the nulls originally existed.

  • When filling the missing values of panel data with spatial neighbors only, set the Temporal Neighborhood parameter to 0.

  • If your data is panel data, the Temporal Neighborhood parameter can be used to filter by time. Alternatively, a Temporal Neighborhood value of 0 allows you to look at spatial neighbors only.

  • It is important to inspect the resulting filled values to make sure they make sense for your analysis. For instance, if your original field was an integer and the tool was set to fill with the average of spatial neighbors, you will end up with decimals in the results, which may not make sense if your input field was a count. Additionally, based on the method used for the Temporal Trend parameter, it may be possible to get a negative number as a result even if none of your existing values were negative. This would not make sense if the field you were filling was population.

  • The N_NEIGHBORS field reports the number of neighbors included in the calculations for that feature. If the Fill Method parameter is set to Temporal Trend, this number is the number of values that exist in the time series for that Location ID value (for instance, if you were only missing one value in the time series, it would report the number of time steps in your dataset minus one). If you are using a Conceptualization of Spatial Relationships parameter value of K nearest neighbors and a Temporal Neighborhood value, the number of neighbors reported will include the k neighbors for the feature that fall within the time window specified.

  • Messages describing details of the analysis and characteristics of the filled fields are written at the bottom of the Geoprocessing pane during tool execution. You can access the messages by hovering over the progress bar, clicking the pop-out button Pop-out, or expanding the messages section in the Geoprocessing pane. You can also access the messages for a previous run of the Fill Missing Values tool via the geoprocessing history.

Parameters

LabelExplanationData Type
Input Features

The feature class containing the null values to be filled.

Feature Layer
Output Features
(Optional)

The output that will include the filled (estimated) values.

If the Related Table parameter value is specified, Output Features will contain the number of estimated values at each location, and Output Table will contain the filled (estimated) values.

Feature Class
Fields to Fill

The numeric fields containing missing data (null values).

Field
Fill Method

Specifies the type of calculation that will be applied. The Temporal Trend option is only available if the Location ID and Time Field parameter values are specified.

  • Average —Null values will be replaced with the mean (average) value of the feature's neighbors.
  • Minimum —Null values will be replaced with the minimum (smallest) value of the feature's neighbors.
  • Maximum —Null values will be replaced with the maximum (largest) value of the feature's neighbors.
  • Median —Null values will be replaced with the median (sorted middle value) of the feature's neighbors.
  • Temporal Trend —Null values will be replaced based on the trend at that unique location.
String
Conceptualization of Spatial Relationships
(Optional)

Specifies how spatial relationships among features will be defined.

  • Fixed distance — Neighboring features within a specified critical distance (the Distance Band parameter value) of each feature will be included in the calculations; everything outside the critical distance will be excluded.
  • K nearest neighbors — The closest k features will be included in the calculations; k is a specified numeric parameter.
  • Contiguity edges only — Only neighboring polygon features that share a boundary or overlap will influence computations for the target polygon feature.
  • Contiguity edges corners — Polygon features that share a boundary, share a node, or overlap will influence computations for the target polygon feature.
  • Get spatial weights from file — Spatial relationships will be defined by a specified spatial weights file. The path to the spatial weights file is specified by the Weights Matrix File parameter.
String
Distance Band
(Optional)

The cutoff distance for the Conceptualization of Spatial Relationships parameter's Fixed distance option. Features outside the specified cutoff for a target feature will be ignored in calculations for that feature. This parameter is not available for the Contiguity edges only or Contiguity edges corners options.

Linear Unit
Temporal Neighborhood
(Optional)

An interval forward and backward in time that determines which features will be used in calculations for the target feature. Features that are not within this interval of the target feature will be ignored in calculations for that feature.

Time Unit
Time Field
(Optional)

The field containing the time stamp for each record in the dataset. This field must be of type Date.

This parameter is required if the Location ID parameter value is provided.

Field
Number of Spatial Neighbors
(Optional)

The number of nearest neighbors that will be included in calculations.

If the Conceptualization of Spatial Relationships parameter's Fixed distance, Contiguity edges only, or Contiguity edges corners option is chosen, this number is the minimum number of neighbors to include in calculations.

Long
Location ID
(Optional)

An integer field containing a unique ID number for each location.

This parameter is used to match features from the Input Features parameter to rows in the Related Table parameter or to specify a unique location ID for determining temporal neighbors.

Field
Related Table
(Optional)

The table or table view containing the temporal data for each feature of the Input Features parameter.

Table View
Related Location ID
(Optional)

An integer field in the Related Table parameter that contains the Location ID parameter value on which the relate will be based.

Field
Spatial Weights Matrix File
(Optional)

The path to a file containing weights that define spatial, and potentially temporal, relationships among features.

File
Unique ID
(Optional)

An integer field containing a different value for every record in the Input Features parameter. This field can be used to join the results back to the original dataset.

If you don't have a Unique ID field, you can create one by adding an integer field to the input feature's attribute table and calculating the field values equal to the FID or OBJECTID field.

Field
Null Value
(Optional)

The value that represents null (missing) values. If no value is specified, <Null> is assumed for geodatabase feature classes. For shapefile input, a numeric value of the null placeholder is required.

Double
Output Table
(Optional)

The output table containing the filled (estimated) values.

The output table is required if a related table is provided.

Table
Append Fields to Input Features
(Optional)

Specifies whether the filled value fields will be appended to the input features or an output feature class will be created with the filled value fields. If you append the fields, you cannot provide a related table and the output coordinate system environment will be ignored.

  • Checked—The fields containing the filled values will be appended to the input features. This option modifies the input data.
  • Unchecked—An output feature class will be created containing the filled value fields. This is the default.

Boolean

Derived Output

LabelExplanationData Type
Updated Input Features

The updated input features containing the filled value fields.

Feature Layer

arcpy.stpm.FillMissingValues(in_features, {out_features}, fields_to_fill, fill_method, {conceptualization_of_spatial_relationships}, {distance_band}, {temporal_neighborhood}, {time_field}, {number_of_spatial_neighbors}, {location_id}, {related_table}, {related_location_id}, {weights_matrix_file}, {unique_id}, {null_value}, {out_table}, {append_to_input})
NameExplanationData Type
in_features

The feature class containing the null values to be filled.

Feature Layer
out_features
(Optional)

The output that will include the filled (estimated) values.

If the related_table parameter value is specified, out_features will contain the number of estimated values at each location, and out_table will contain the filled (estimated) values.

Feature Class
fields_to_fill
[fields_to_fill,...]

The numeric fields containing missing data (null values).

Field
fill_method

Specifies the type of calculation that will be applied. The TEMPORAL_TREND option is only available if the location_id and time_field parameter values are specified.

  • AVERAGENull values will be replaced with the average (mean) value of the feature's neighbors.
  • MINIMUMNull values will be replaced with the minimum (smallest) value of the feature's neighbors.
  • MAXIMUMNull values will be replaced with the maximum (largest) value of the feature's neighbors.
  • MEDIANNull values will be replaced with the median (sorted middle value) of the feature's neighbors.
  • TEMPORAL_TRENDNull values will be replaced based on the trend at that unique location.
String
conceptualization_of_spatial_relationships
(Optional)

Specifies how spatial relationships among features will be defined.

  • FIXED_DISTANCENeighboring features within a specified critical distance (the distance_band parameter value) of each feature will be included in the calculations; everything outside the critical distance will be excluded.
  • K_NEAREST_NEIGHBORS The closest k features will be included in the calculations; k is a specified numeric parameter.
  • CONTIGUITY_EDGES_ONLY Only neighboring polygon features that share a boundary or overlap will influence computations for the target polygon feature.
  • CONTIGUITY_EDGES_CORNERS Polygon features that share a boundary, share a node, or overlap will influence computations for the target polygon feature.
  • GET_SPATIAL_WEIGHTS_FROM_FILESpatial relationships will be defined by a specified spatial weights file. The path to the spatial weights file is specified by the Weights_Matrix_File parameter.
String
distance_band
(Optional)

The cutoff distance for the conceptualization_of_spatial_relationships parameter's FIXED_DISTANCE option. Features outside the specified cutoff for a target feature will be ignored in calculations for that feature. This parameter is not available for the CONTIGUITY_EDGES_ONLY or CONTIGUITY_EDGES_CORNERS options.

Linear Unit
temporal_neighborhood
(Optional)

An interval forward and backward in time that determines which features will be used in calculations for the target feature. Features that are not within this interval of the target feature will be ignored in calculations for that feature.

Time Unit
time_field
(Optional)

The field containing the time stamp for each record in the dataset. This field must be of type Date.

This parameter is required if the location_id parameter value is provided.

Field
number_of_spatial_neighbors
(Optional)

The number of nearest neighbors that will be included in calculations.

If the conceptualization_of_spatial_relationships parameter's FIXED_DISTANCE, CONTIGUITY_EDGES_ONLY, or CONTIGUITY_EDGES_CORNERS option is chosen, this number is the minimum number of neighbors to include in calculations.

Long
location_id
(Optional)

An integer field containing a unique ID number for each location.

This parameter is used to match features from the in_features parameter to rows in the related_table parameter or to specify a unique location ID for determining temporal neighbors.

Field
related_table
(Optional)

The table or table view containing the temporal data for each feature of the in_features parameter.

Table View
related_location_id
(Optional)

An integer field in the related_table parameter that contains the location_id parameter value on which the relate will be based.

Field
weights_matrix_file
(Optional)

The path to a file containing weights that define spatial, and potentially temporal, relationships among features.

File
unique_id
(Optional)

An integer field containing a different value for every record in the in_features parameter. This field can be used to join the results back to the original dataset.

If you don't have a unique_id field, you can create one by adding an integer field to the feature class table and calculating the field values equal to the FID or OBJECTID field.

Field
null_value
(Optional)

The value that represents null (missing) values. If no value is specified, <Null> is assumed for geodatabase feature classes. For shapefile input, a numeric value of the null placeholder is required.

Double
out_table
(Optional)

The output table containing the filled (estimated) values.

The output table is required if a related table is provided.

Table
append_to_input
(Optional)

Specifies whether the filled value fields will be appended to the input features or an output feature class will be created with the filled value fields. If you append the fields, you cannot provide a related table and the output coordinate system environment will be ignored.

  • APPEND_TO_INPUTThe fields containing the filled values will be appended to the input features. This option modifies the input data.
  • NEW_FEATURESAn output feature class will be created containing the filled value fields. This is the default.
Boolean

Derived Output

NameExplanationData Type
updated_features

The updated input features containing the filled value fields.

Feature Layer

Code sample

FillMissingValues example 1 (Python window)

The following Python window script demonstrates how to use the FillMissingValues function.

import arcpy
arcpy.env.workspace = r"C:\STPM\Chicago.gdb"
arcpy.FillMissingValues_stpm("Chicago_Data", "Chicago_Filled", "COUNT", "AVERAGE",
                             "K_NEAREST_NEIGHBORS", "", "", "", 8)
FillMissingValues example 2 (stand-alone script)

The following stand-alone Python script demonstrates how to use the FillMissingValues function.

# Fill missing values using a feature set and related table
# Use the results to create a space-time cube from defined locations
# Run Emerging Hot Spot Analysis on the data
# Visualize the results in 3d

# Import system modules
import arcpy

# Set property to overwrite existing output, by default
arcpy.env.overwriteOutput = True

# Local variables ...
arcpy.env.workspace = r"C:\STPM\Chicago.gdb"

try:
    # Fill missing values in a feature class containing block group polygon shapes and a related table containing the incidents
    # Since some of the values are missing, you will fill them using the temporal trend method.
    arcpy.FillMissingValues_stpm("Chicago_Feature", "Chicago_FilledFeature", "COUNT", "TEMPORAL_TREND", "", "", NoneNone,
                                 "TIME", "", "MYID", "Chicago_Table", "MYID", "", "", "", "Chicago_FilledTable")

    # Create a defined location space-time cube using a related table
    # Using a reference time at the start of the month to force binning fall on month breaks
    # Using temporal aggregation to sum multiple entries into one month
    # Using the method drop location if missing values since you already filled using Fill Missing Values
    arcpy.CreateSpaceTimeCubeDefinedLocations_stpm("Chicago_FilledFeature", r"C:\STPM\Chicago_Cube.nc", "MYID",
                                                   "APPLY_TEMPORAL_AGGREGATION", "TIME", "1 Months", "REFERENCE_TIME",
                                                   "10/1/2015", "", "COUNT SUM DROP_LOCATIONS", "Chicago_FilledTable",
                                                   "MYID")

    # Run an emerging hot spot analysis on the defined locations cube
    # Using contiguity edges so only block groups that bound each other are considered neighbors
    arcpy.EmergingHotSpotAnalysis_stpm(r"C:\STPM\Chicago_Cube.nc", "COUNT_SUM_NONE",
                                       "Chicago_Cube_EmergingHotSpot", "", 1, "",
                                       "CONTIGUITY_EDGES_ONLY")

    # Use Visualize Cube in 3d to see the hot spot results for each time slice
    arcpy.VisualizeSpaceTimeCube3D_stpm(r"C:\STPM\Chicago_Cube.nc", "COUNT_SUM_NONE", "HOT_AND_COLD_SPOT_RESULTS",
                                        "Chicago_Cube_Visualize3d")

except arcpy.ExecuteError:
    # If an error occurred when running the tool, print the messages
    print(arcpy.GetMessages())

Licensing information

  • Basic: Yes
  • Standard: Yes
  • Advanced: Yes

Related topics